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I. ihtroddction 

For several years the ability of systems engineers to 
design custom digital integrated circuits has been growing. 
The Mead and Conway design methodology described in 
Intr oduc tion to VLSI System s [ Eef . 1], permits the systems 

engineer to be his own logic circuit designer. A prolifera- 
tion of computer-aided design (CAD) systems such as the 
MacPitts silicon compiler [Ref. 2], the chip layout language 
(CLL) [Ref. 3], the graphics editor Caesar [Ref. 4]/ and the 
Burlap hierarchical layout language [Ref. 5] make it 
possible for the engineer to rapidly carry the Mead and 
Conway design methodclogy through to a final design. This 
includes iterative simulation and redesign to provide justi- 
fiable confidence in the final design submitted for 
fabr ication. 

Many of the techniques utilized in the Mead and Conway 
methodology and most of the CAD tools are based on having 
the final design implemented in a technology that uses only 
one type of doping for the semiconduc tor material in the 
active region of the transistors. Because of their higher 
switching speed, negatively doped metal oxide semiconductor 
(NMOS) transistor technologies are generally used. 

Selection of an NMOS implementation technology does 
provide the systems engineer with a complete and proven 
methodology for the design of a very large scale integrated 
(VLSI) circuit and allows the use of many extensively tested 
CAD tools. Like any other design decision, selection of 
NMOS implementation brings with it some limitations. There 
are two primary problems associated with NMOS digital 
circuits. 
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The first is the ultimate switching speed limitation. 
Though many NMOS VLSI circuits operate at clock rates in the 
8 to 10 KHz range, there are many applications requiring 
higher clock rates. The second problem is the dissipation 
of the relatively large amount of power consumed by NMOS 
digital circuits. State of the art, commercially available 
NMOS VLSI circuits commonly have power consumptions in the 
vicinity of 3 to 5 watts. Considerable design effort is 
required to insure that the dissipation of this much energy 
by a chip measuring approximately 5 millimeters on a side 
does not alter the performance of the micron sized features 
on the chip. 

One group of technologies that offers both increased 
switching speed and greatly reduced power consumption is 
complementary metal oxide semiconductors (CMOS) . CMOS 
circuits also offer the benefits of greater radiation hard- 
ening and increased noise margin. In this thesis investiga- 
tion, much of the Mead and Conway methodology was utilized 
in the design of a CMOS circuit. A general purpose color 
graphics CAD tool called Caesar that has been frequently 
used in the design of NMOS circuits was employed. In 
carrying out the design of the 16 bit pipelined high speed 
adder in CMOS two separate goals were pursued. The first, 
of course, is speed and the second is verifiability. A high 
speed adder implies not only a high clock rate of operation 
but also a small latency between input of operands and 
output of the sum. 

A discussion of CMOS technologies and the implementation 
of logic circuits in those technologies follows in Chapter 
2. Chapter 3 presents a description of the CAD tools used 
to construct and simulate the layout for the adder. The 
logic and layout design of the adder is covered in Chapter 4 
and is followed by a test plan for the fabricated chip in 
Chapter 5. 
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II. CMOS CIRCUITS 



Before the design of CMOS digital circuits can be 
attempted, an understanding of how to best implement logic 
functions in CMOS is necessary. It is also important to be 
aware of the advantages and disadvantages of the different 
CMOS i up le mentation technologies. In this chapter the oper- 
ation of CMOS digital circuits is explained using similar 
NMOS circuits as a benchmark for comparison. The different 
methodologies for assembling the CMOS pieces to produce the 
desired logical results are reviewed and the selection of 
the CMOS-Bulk p-well implementation technology is explained. 

1. COMPARISON WITH NMOS 

In NMOS digital circuits there is only one type of 
switching device, namely the n-channel enhancement mode 
metal oxide semiconductor (MOS) transistor. The other prin- 
cipal device utilized in NMOS circuits is the depletion mode 
n-channel MOS device which acts as a load resistor. In CMOS 
there are both n-channel and p-channel enhancement mode 
transistors available. As in NMOS, the n-channel device can 
be considered on when Vdd (typically +5 Volts DC), a logical 
1, is present on its gate. The p-channel device can be 
considered on when ground (GND) , a logical 0, is present on 
its gate. In Figure 2. 1 are the symbols that will be used 
for the n-channel and p-channel transistors in this thesis. 

The basic differences between NMOS and CMOS technologies 
can be demonstrated by comparing their application to some 
basic digital circuits. 



10 



Vdd /N 



g ate r 



n-channel 



gate 



p-channel 
“ GND 



Figure 2. 1 CMOS Transistor Symbols. 

1 . The Inv ert er 

Figure 2.2 (a) shows an NMOS inverter. Whenever 

there is a logical 1 on the input, the voltage drop across 
the lead resistor is approximately Vdd and the output is a 
logical 0. This results in steady state power consumption. 
When the input switches to a logical 0, before the output 
can assume a logical 1 , the lead capacitance (Cl) on the 
output must be charged to Vdd through the load resistor with 
a resistance of several kilohms. This results in a much 
longer transition from 0 to 1, where the load capacitance is 
charged through the load resistor, than from 1 to 0 where 
the load capacitance is discharged through the switched on 
NMOS enhancement transistor. The reason for this asymmetry 
is that the pull-down transistor’s on resistance is typi- 
cally only one fourth or less that of the on resistance of 
the pull-up load depletion mode transistor. The technique 
of precharging circuits, where all outputs are set to 
logical 1 during one clock cycle and then selectively forced 
to 0 on the opposite (evaluation) clock cycle has proven 
helpful in gaining control over the unsymmetric switching 
times. This longer switching time from 0 to 1 must still be 
accounted for, however, and represents the primary limita- 
tion to the speed of NMOS circuits. 



r 




Vdd 



A 



m 



out 



GND — 



Figure 2.2 (a) HHOS Inverter (b) CMOS Inverter. 

In the CMOS inverter of Figure 2.2 (b) the input is 
applied to the gates of both devices. An input of logical 1 
causes the n-channel device to switch on and the p-channel 
device to switch off, resulting in an output of logical 0. 
Similarly, an input of 0 results in an output of 1. In both 
cases, one device is fully off, representing a resistance on 
the order of gigaohms. Thus, the steady state power 
consumption is essentially zero. In operation the only 
power consumption of consequence occurs during the tran- 
sition when neither transistor is fully on or off. 
Additionally, since the output load capacitance is both 
charged and discharged through a turned on transistor, the 1 
to 0 and 0 to 1 switching delays are theoretically the same. 

Actually the switching delays depend on many parame- 
ters. The n-channel and p-channel device dimensions are 
frequently not the same, the mobility of the electrons in 
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the n-channel is greater than the mobility of the holes in 
the p-channel. Also, the capacitive load seen by the 

p-channel device in CMOS p-well (CilOS-pw) is greater than 
the load seen by the n-channel device because of the highly 
doped p-well. Typically, the result in CMOS-pw is a 
slightly longer transition time of the 0 to 1 output tran- 
sition. Some designers attempt to compensate for this by 
consistently making the p-channel transistors wider than the 
n-channel transistors. 

Unlike NMOS, the output of a CMOS digital circuit 
makes a full excursion between Vdd and GND. This makes CMOS 
circuits less sensitive to noise than NMOS circuits. CMOS 
should also benefit more from future reductions in feature 
size. NMOS is more restricted in ultimate -feature size 
because the power dissipation requirements of the depletion 
mode devices will create more problems as feature sizes 
shrink. In Figure 2.3 the relative sizes of minimum dimen- 
sion inverters implemented in currently available 3 micron 
feature size CMOS-PW and NMOS technologies are shown. 

2. The NOR Gate and Tra ns m ission Gate 

Figure 2.4 shows the circuit diagrams and layouts of 
a two-input NOR gate implemented in both CMOS-PW and NMOS. 
From Figures 2.3 and 2.4 it is evident that static 1 CMOS 
gates are more complex and area consuming than their NMOS 
counterparts. In these fully complementary circuits a 
redundancy in the structures is evident. The pull-up only 
or pull-dcwn only would be sufficient to implement the 
logic. In the CMOS circuits of Figures 2.3 and 2.4 the 
inputs must perform two tasks. A logical 1 on an input 
causes both a connection between the output and ground and a 



1 Static logic circuits continuously evaluate their 
inputs and produce their specified logic output. Dynamic 
circuits perrorm logical evaluation of the inputs only when 
directed to do so by control signals and/or clock signals. 
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Figure 2.3 Minimum Dimension Inverters. 

disconnection between the output and Vdd. Logically these 
two actions are equivalent, therefore only one action should 
be necessary to implement the logic. Design methodologies 
to accomplish this are described in section B of this 
chapter. The parallelism of the CMOS transmission gate of 
Figure 2.5 and the NMOS pass transistor is evident. The 
major difference lies in the bilateral nature of the CMOS 
transmission gate. It is made up of both n-channel and 
p-channel devices and requires both polarities of the 
control signal for operation. The reason for this bilateral 
requirement is that the p-channel device does not transmit 
low voltages well and the n-channel device does not transmit 
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high voltages well. The resulting unpredictable voltage 
drops make it necessary to utilize both types of transis- 
tors. This increase in complexity over its NHOS counterpart 
is partially offset by the absence of the level restoring 
circuitry NilOS reguires following a pass transistor. 2 



2 In NMOS digital circuits the length to width ratio of 
the pull down transistor is usually four times that of the 
depletion mode transistor load. This ratio is required to 
insure sufficient excursion of the output voltage. However, 
after a pass transistor is used, a ratio of 8:1 rather than 
4:1 must be used to restore the VGS threshold voltage drop 
across the pass transistor. 



15 



1 



& 




Tc" 



Figure 2.5 CMOS Transmission Gate. 

In general CMOS technologies are ratioless. The use 
of "improper" ratios will not affect the logical operation 
of most CMOS gates, it will only affect the speed of opera- 
tion of the gates. 

B. CMOS DESIGN METHODOLOGIES 

Static gate CMOS circuits have three serious deficien- 
cies when compared to static NMOS gates. First, they are 
more area consuming. Second, they can be slower. Though 
the individual gates can be faster in CMOS, the p-channel 
and n-channel gates are in parallel, thus, the fanout 3 and 
the output load capacitance of each circuit are doubled 
Third, a CMOS static gate is redundant, duplicating its 
functionality in both the pull-up and pull-down section. 

One approach to remedy these deficiencies is to use a 
static NMOS-like style of design as in Figure 2.6 Here the 
p-channel device is always on and the pull-up to pull-down 
dimension ratio is relied upon to produce the proper output 
voltage. This introduces power consumption problems and 
takes away the full excursion on the output. Another 



3 Fancut represents the number of transistors that the 
output of a logic gate must drive. 
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Figure 2.6 HHOS-like CHOS Static Gate [Ref. 6]. 

approach is to make extensive use of transmission gates to 
build up logic functions. Using transmission gates means 
both polarities of all control signals are required. The 
resulting large number of wires reguired to route these 
control signals can become very area consuming, especially 
if only one metal layer is available. 

A third and more effective solution is to use dynamic 
logic. Figure 2.7 contains three different implementations 
of a dynamic three- input HAND gate. In each, the output is 
meaningful (i.e. represents the value of the boolean expres- 
sion ini in2 in3) only when elk is high and elk is low. The 
circuits of Figure 2.7 (a) and (b) depend on the pull-up to 
pull-down ratio to produce the proper output. As with the 
NMOS-like style of design, full excursion on the output is 
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lost and there is steady state power consumption during the 
evaluation cycle. The circuit in Figure 2.7 (c) is prec- 
harged when elk is lew and evaluation of the inputs takes 
place when elk is high. This configuration allows only one 
change of the output from 1 to 0, so the inputs must be 
stable at the time elk goes high. A change of one of the 
inputs from 1 to 0 after elk has gone high cannot cause the 
output to return to 1. 

In general dynamic CMOS eliminates the redundancy of 
static CMOS by applying all inputs to one type of device and 



elk 



ini 



in; 



in3 




elk 



i — 1 ini 



in2 



in3 



elk 



J 

ini in 2 ini 



1 

1 



Figure 2.7 Dynamiq HAND Gates [Bef. 6]. 



a control signal to the other type of device. The most 
popular dynamic CMOS logic design technique is domino CMOS 
[Ref. 7], illustrated in Figure 2.8 Here the output is the 
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logical AND of the boolean function (ini in2 + in3) to be 
implemented and a control (clock) signal. When the clock is 
low, the circuit is precharged, and when the clock is high 
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Figure 2.3 Domino CMOS Structure [fief. 6]. 

evaluation occurs. With a common clock shared by all the 
domino gates on a chip, during the evaluation cycle the 
signals ripple through the chip as though the logic were 
purely static. The follow on inverter insures that the 
output of each gate is low when evaluation begins. This 
prevents the outputs of all gates from changing unless 
driven lew by the inputs. Domino CMOS is not always the 
answer though. If the logic of Figure 2.9 were implemented 
in domino CMOS it would be more area consuming than the same 
circuit implemented in static CMOS. Dynamic CMOS is more 
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area consuming in this case because these are simple gates 
vith only a few inputs. Each NCR gate if implemented stati- 
cally would need two n-channel devices and two p-channel 
devices. If implemented dynamically/ each NOR gate requires 
three transistors of one type (one for each input and one 
for the control signal) and one transistor of the other type 
(for the control signal again) . The number of transistors 
needed remains the same but the dynamic logic requires the 
designer to keep three inputs electrically isolated instead 
of just two. And if the dynamic design technique is domino, 
six additional inverters will be needed. As can be seen in 
Figure 2.U, in CMOS a NOR gate can be constructed from just 
one stage. Adding the follow-on inverter of the domino 
design results in an OR gate.' Thus a second inverter is 
required to return the logic to that of a NOR gate. 




Figure 2.9 Circuit Difficult to Implement in Domino CMOS. 



C. CMOS IMPLEMENTATION TECHNOLOGIES 

One of the principal issues in the design of a process 
to implement CMOS digital circuits in silicon is how to 
isolate the two types of devices. This can be accomplished 
by using a completely insulating substrate or through a more 
complex fabrication process. 
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1. CMOS-SOS 



The only process currently offered by Metal-Oxide 
Semiconductor Implementation Service (MOSIS) which uses an 
electrically insulating substrate is Silicon on Sapphire 
(SOS) . In this technology the n-channel and p-channel tran- 
sistors are formed on silicon islands left after etching an 
epitaxial layer of silicon on a sapphire (AI 3 .O 3 ) substrate. 

2. CMOS-B ulk 

The other CMCS processes offered by MOSIS all use 
CMOS-Bulk p-well technology. The p-well processes differ in 
the number of layers of metal interconnections (1 or 2 ) and 
the presence or absence of capacitors. In CMOS-Bulk p-well 
(n-well) the substrate is n-doped (p-doped) and the 
p-channel (n-channel) devices are in this substrate. To 
isolate the n-channel (p-channel) devices from the substrate 
a heavily doped p— well (n-well) is first placed to act as 
the back gate. The heavy doping of the p-well (n-well) 
degrades the performance of the n-channel (p-channel) device 
while the p-channel (n-channel) device is optimized. In 
p-well CMOS, though the mobility of electrons in the 
n-channel device still exceeds that of the holes in the 
p-channel device, the performance difference of the transis- 
tors is minimized. The more uniform performance of the two 
transistor types makes the p-well process appropriate for 
CMOS random logic. 

Figures 2.10 and 2.11 represent the top and side 
views of the steps of the CMOS-pw process for the production 
of an inverter. These steps are: (1) starting with an 

n-type substrate the p-well is patterned, (2) The active 
areas in the p-well and on the substrate are established, 
(3) the polysilicon is patterned, (4) the two ion implant 
masks are placed (the N+ mask is simply the photographic 
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negative of the P+ mask) , (5) contact cuts are made, and 

{6) the metal is placed. 

a. Latchup in CMOS-pw 

One of the main problems associated with 
CMOS-Eulk, both p- well and n-well is latchup. Basically 
latchup involves generation of a short circuit between Vdd 
and GND, and can result in the complete destruction of a 
chip. Many researchers have tried to formally define the 
conditions [Bef. 8] that cause latchup to occur. This task 
is extremely complex because the phenomenon is so dependent 
on layout, which is unique to each chip design. Though a 
fully quantitative analysis of latchup is still not avail- 
able, a qualitative analysis will show what happens on the 
chip when latchup occurs. 

Looking at the side view of an inverter in 
Figure 2.12, parasitic bipolar transistors can be seen. The 
base of the npn transistor is the p-well and the base of the 
pnp transistor is the n-doped substrate. These parasitic 
transistors are connected as shewn in Figure 2.13 . If the 
output of the gates goes below GND by a value equal to the 
threshold of the npn transistor, its emitter starts to 
inject current (electrons) into the base (p-well) and the 
resultant collector current flows to the Vdd node. If the 
resistance between the Vdd node and the source of the 
pull-up p-channel MO S transistor, R1, is large enough, the 
voltage drop across B1 will exceed the threshold of the pnp 
transistor. The collector current (holes) of the pnp device 
flows to the GND node. If the resistance between the GND 
node and the source of the pull-down n-channel MOS tran- 
sistor, R2, is great enough, the resultant voltage drop 
across R2 will increase the base current in the npn tran- 
sistor. As is evident, there is positive feedback. 
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Figure 2.10 P-Sell Process, Top View [Ref. 6]. 
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Figure 2.11 P-Sell Process, Side View [Bef. 9]. 
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The only way to stop this destructive process once it has 
started is to disconnect Vdd or GND. Prevention of latchup 
must he designed in. 




Figure 2.12 Bipolar Transistcrs in CMOS-Bulk [Bef. 6]. 




Figure 2.13 The Latchup Circuit [Ref. 6]. 

The MOSIS CMOS-Bulk p-well design rules include 
features for the specific purpose of reducing the 
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probability of latchup. The ninimum separation rules for 
p-vells and P+ doped active areas exist for this purpose. 
Their aim is to reduce the gain of the parasitic bipolar 
transistors, thus requiring a larger noise spike of longer 
duration to start the latchup sequence. A frequently used 
technique is the grounding of the p-vell as illustrated in 
Figure 2.14 . Here the effect cf the P+ doped area covering 
half of the contact cut for the ground bus is to reduce the 
resistance R2 in Figure 2.13 . Another practice is to place 
a small capacitor across the Vdd and GND pins of CMOS-Bulk 
chips. To provide capacitive filtering of noise spikes on 
the chip, Vdd and GND busses are frequently run close 
together. Also, Vdd input pads are designed to provide 
capacitance between Vdd and GND. 




Figure 2. 14 Grounding of the P-Well. 
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substrate, or in an epitaxial layer of silicon on a ?+ or N + 
wafer. Since the well doping does not have to overcome the 
substrate doping, both the n-channel transistors in the 
p-well and the p-channel transistors in the n-well can be 
optimized. Domino CMOS is enhanced by the use of this 
process since the optimized n-channel devices can speed up 
the complex boolean expression evaluation and the optimized 
p-channel devices can speed up the signal drive between 
stages (thereby reducing the effect of a given fanout). 

D. CMOS TECHNOLOGY SELECTION 

The CMOS implementation technologies available from 
MOSIS are CMOS-Bulk p-well with one metal layer, CMOS-Bulk 
p-well with two metal layers, CMOS-Bulk p-well with two 
metal layers and capacitors (for analog circuits) and 
CMOS-SOS. 

The advantages of CM0S-3ulk are: (1) very good noise 

margin, (2) faster than NMOS, and (3) a proven reliable 
fabrication process. Its disadvantages are: (1) latchup 

susceptibility, (2) use of p-well guard rings is needed if 
radiation hardening is desired, (3) lower circuit density 
than NMOS or CMOS-SOS, and (4) more complex design rules 
than either NMOS or CMOS-SOS. 

The advantages of CMOS-SOS are: (1) faster than NMOS or 

CMOS-Bulk, (2) very good noise margin, (3) intrinsically 
radiation hardened, and (4) no latchup. Its disadvantages 
are: (1) expensive fabrication process due to the sapphire, 

(2) sapphire variability reduces the reliability of the 
fabrication process, (3) thermal mismatch between the 
sapphire and silicon limits the carrier mobility, and (4) it 
is not a viable technology for dynamic memory due to back 
channel leakage. 
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CMOS-Bulk p-well was selected as the implementation 
process for the adder for the following reasons. First, 
technology files for this process were available at the 
Naval Postgraduate School (NPS) enabling the use of extant 
computer aided design (CAD) tools. Second, since this would 
be the first CMOS VLSI design at NPS, utilizing the most 
reliable process is prudent to prevent design problems from 
being clouded by implementation process problems. 
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III. DESIGN TOOLS 



To employ the Mead-Conway design methodology on a large 
scale design, three computer aided design (CAD) tools are 
needed. A layout design editor for viewing the circuits as 
they are created is the first tcol required. Next, a design 
rule checker is necessary to confirm that all the design 
rules for the specified technology have been adhered to. 
Though not a complex task, the large number of checks that 
must be made for even a modest design makes manual design 
rule checking highly error prone. Finally, a circuit simu- 
lator is needed to verify that the circuit as designed 
provides the proper logical output. In the design of the 
sixteen-bit pipelined adder, the Caesar layout editor 
[Ref. 4], the Lyra design rule checker [Ref. 10], and C. 
Terman’s RNL circuit simulator [Ref. 11 ] were employed. 

A. CAESAR 

Caesar is a generic layout editor. It is not designed 
for any particular VLSI implementation technology. It is 
not even limited to designing integrated circuits. Caesar 
is a graphics layout editor for the creation and manipula- 
tion of rectangles where the user specifies the color, size, 
and placement. It is through the user specified technology 
file that the rectangles of color take on meaning. At the 
Naval Postgraduate School (NPS) there are two technology 
files available for use with Caesar. One is for N-doped 
metal oxide semiconductors (NEOS) and the other is for 
complementary metal oxide semiconductors utilizing a P-doped 
well (CMCS-pw) . 
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Caesar works with files cf its own special format. 
These file are indicated by an appended file type of ca (i.e. 
xxxx.ca). On command Caesar will generate a Caltech 
Intermediate Format (CIF) file cf the same layout. Again it 
is tie technology file which tells Caesar which CIF layer 
labels to attach to the colored rectangles. 

At NFS, Caesar is set up to take commands from any 
terminal where the execution of the Caesar program is initi- 
ated (usually the AEM-3a console adjacent to the color 
graphics display unit) and from a four-button puck on a 
graphics tablet attached to the color display device. 
Caesar displays its graphics results on an AED 767 color 
monitor and displays its menus, messages, and prompts on the 
command console. Detailed information on the installation 
and operation of Caesar at NPS can be found in Reference 4 
and Reference 2. 

Caesar is an interactive CAE tool. The results of any 
command are rapidly displayed on the AED 767. The results 
of a command may be undone (u) cr repeated (.) with a single 
stroke of the specified key on the command console. While 
running Caesar, a user may also call upon the design rule 
checker, Lyra, to check the area inside and within three 
Caesar units 4 of the current box for design rule violations. 
This interactive use of the layout graphics display and the 
design rule checker helps to insure that there will not be 
any design rule forced changes late in the design cycle when 
changes are much more time consuming. With Caesar’s level 
of interaction with the designer, the design loop consisting 
of (1) issue commands to perturb existing circuit, (2) 
visual inspection to verify command's generation of desired 



4 A Caesar design is layed out on a grid of Caesar units. 
These units do not represent any specific length. When 
creating a CIF file from a Caesar file the desired length of 
a Caesar unit is specified. 
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results, and (3) design rule checking of new circuit, can be 
rapidly completed. 

Caesar is a hierarchical design tool. Kith Caesar, 
circuits can be created by piecing together cells (other 
files of type .ca) which in turn may be made up of other 
sub-cells. Theoretically, there is no limit to the number 
of levels in the hierarchy. Net only can cells (sub-cells, 
etc.) be called upon to fill locations in a circuit, if they 
need to be modified to function properly, Caesar provides a 
subedit mode to facilitate editing of layouts one level 
below the current editing level. Care must be taken when 
this subedit feature is used since the changes made to the 
cell are global. Everywhere the given cell is used on the 
chip, the newly edited version will appear. 

B. LYEA 

like Caesar, Lyra is a generic design rule checker. 
When Lyra is invoked from within Caesar, the actual program 
executed to check for design rule errors depends on the 
technology file indicated in the header of the Caesar file 
being edited. After running, Lyra sends a message to the 
command console indicating the number of errors found. On 
the graphics display Lyra paints the exact location of each 
error and labels each error with the design rule violated. 
The error label consists of abbreviations for the layers 
involved, followed by an underscore, followed by an abbrevi- 
ation for the type of violation detected. Table 1 lists the 
abbreviations used by Lyra for CMOS-pw. 

The winter 1983 distribution of the University of 
California at Berkeley (UC3) CAE tools included two versions 
of Lyra. One for the Mead-Conway NMOS design rules and the 
other for the Jet Propulsion Laboratory’s (JPL) five-micron 
feature size CMOS-pw design rules. Since MOSIS no longer 
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TABLE 1 



Lyra Error Abbreviations 



Layer 


Abbreviation 


Error 




poxysilicon 


P 


¥ 


minimum width 


metal 


m 


S 


minimum sei 


uaration 


p-well 


w 


X 


malformed 1 


transistor 


n+ diffusion 


d 








cut 


c 








p+ diffusion 


P 









supports fabrication of the JEL CMOS-pw process, design 
rules for the MOSIS supported three-micron CMOS-pw process 
were obtained. Professor Marco Annatarone at 
Carnegie-Mellon University (CMD) genera ted the listing of the 
three-micron CMOS-pw design rules compatible with Lyra and 
has provided NPS with a copy. To generate executable code 
from the prototype Lyra program and imbed the specific 
process design rules, the program rulec (see Appendix B) is 
run with the design rule list file as its argument. 

Now, when Lyra is invoked from Caesar while editing a 
CMOS-pw technology circuit, the three-micron minimum feature 
size CMOS-pw design rules are applied. This version of Lyra 
does not check for exceeding any maximum dimensions. The 
only maximum size design rule in this technology is for 
contact cuts, which may not exceed 3 microns by 8 microns. 
Avoidance of improper contact cuts can be accomplished by 
utilizing Caesar's hierarchical nature. Contact cuts of all 
needed sizes and types are generated once and saved to be 
inserted as cells wherever needed. 

C. SIMULATION 

Once a circuit layout has completed this initial design 
loop, it matches the designer’s conception of how it should 
appear and is free of design rule violations. The perform- 
ance of the given circuit, though, remains uncertain. To 
simulate the performance of the design, programs such as 
SPICE [Ref. 11 ] and RNL [Ref. 11 ] are used. 



32 



1. SPICE 



SPICE is an important simulation tool in the design 
of high speed CMOS digital and analog circuits. With its 
detailed device modeling, SPICE can provide accurate 
predictions of performance once the device parameters of the 
implementation technology are known. SPICE provides the 
logical output of a circuit based upon the inputs and 
describes the transient behavior of the circuit as it 
changes to the new logical output. Thus SPICE enables a 
designer to optimize transistor dimensions for speed. 

Unfortunately, the version of SPICE currently avail- 
able cn both the Vax 11-780 and the IBM 3033 at NPS (version 
2G6) fails when the parameters of the devices fabricated by 
the MCSIS three-micron CMOS-pw process are used. With these 
parameters the transient behavior solutions do not converge. 

Engineers at CMU, UCB, and the. University of 
Washington (UW) are currently employing an experimental 
version of SPICE (version 2X. x developed at UCB) which is 
successful simulating with the three-micron CMOS-pw device 
parameters. This version, however, has other bugs and is 
therefore not available for general distribution. The 
changes to SPICE 2G6 that enable SPICE 2X.x to simulate the 
three-micron CMOS-pw devices will be incorporated into the 
next distribution of SPICE (version 2G7) . The Naval 
Postgraduate School is in the gueue of institutions to 
receive SPICE 2G7 once it is ready. 

In order to run a SPICE simulation of a CMOS circuit 
designed using Caesar, the following steps should be 
executed. First, the labeling feature of Caesar is used to 
place labels on the electrical nodes of interest in the 
circuit (Vdd, GND, input, output, etc.). Second, the Caesar 
command 

: cif 100 -p 
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is issued to generate the basename. cif file. The parameter 
100 indicates a scale of 100 centiaicrons per Caesar unit 5 
and must be specified unless the default value of 200 
centimicrons per Caesar unit is desired. The -p option 
causes entries to be made in the basename.cif file for the 
labels assigned. Third, after exiting Caesar and returning 
to Unix, the circuit extractor Mextra [Ref. 10] is invoked 
using the command 

% mextra basename 

to create the file basename. sim. To modify the basename. sim 
file to a SPICE file (basena me . spice) , the program sim2spice 
[Ref. 1 1 ] is used. The basena me . spice file contains a list 
of transistors and capacitors in the circuit in a SPICE 
compatible format. 

The basena me .spice file must be edited to add the 
model parameters for the transistors, to specify the wave- 
forms of the input (s), to specify the type of analysis to be 
performed (usually transient analysis) and to specify the 
output to be produced (tables, graphs, etc.). The Spice 
User’s Manual [Ref. 11] contains the formats of these addi- 
tions to basename. spice. Best case and worst case device 
model parameters for the MOSIS three-micron CMOS-pw process 
as compiled by Dr. M Annaratone of CMU and Dr. L. Glasser 
of MIT are found in Appendix A. 

2. ENI 

RNL is a timing and logic simulator for digital MOS 
circuits. It is an event driven simulator which uses a 
resistance-capacitance model of a circuit to estimate node 
transition times and to estimate the effects of charge 



5 Since the minimum dimensions for the 3-micron CMOS-pw 
process are specified in microns instead of lambda, CMOS-pw 
circuits are usually designed on Caesar using one micron per 
Caesar unit. 
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sharing. 6 After input values have been assigned by the user, 
RNL calculates the effects of those inputs by repeating the 
following operations until there are no further node value 
changes: (1) when a node is added to the network due to a 
transistor being turned on, the charge sharing implications 
of the new node’s capacitance and logic state on each of its 
electrical neighbors is computed, (2) for each node that 
might be affected, Vthev and Ethev (the parameters of the 
Thevenin equivalent circuit) are calculated and the new 
logic state is determined from Vthev (O.OVad to 0.3Vdd = 
logic 0, 0.8Vdd to I.OVdd = logic 1, logic X otherwise), (3) 
if the node has changed state, the transition time is calcu- 
lated using the node’s capacitance, and (4) any changes are 
propagated to other nodes. Details of the computation 
methods used by RNL can be found in the RNL Version 4.2(UW) 
User’s Guide [Ref. 11]. More important to the user is an 
understanding of what information RNL keeps, what it 
discards, and how it decides what to do next. 

Basic to the operation of RNL is the idea of an 
event. The three elements of an RNL event are: (1) a node 
in the network, (2) a new logic state for the node, and (3) 
the time when the node value changes to the new logic state. 
RNL maintains a list of events, sorted by time, that tells 
what processing remains to be done. When the user changes 
an input, an event is added to the list. RNL sequentially 
processes the next event on the list, stopping when (1) the 
list is empty, (2) a node the user is tracing changes value, 
or (3) when the specified simulation time interval has 
elapsed. To process an event, RNL removes it from the list, 
changes the node's state to reflect its new value, and then 



6 Charge sharing refers to the capacitive effects that 
happen when two or more previously unconnected nodes, each 
having seme charge and capacitance, become connected by a 
resistor (transistor turning on). 
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calculates any new events resulting from the node's new 
value. 

In calculating new events, first all nodes that 
might be affected by the change are found and marked. This 
includes the source and drain cf all transistors for which 
the current node is the gate and all nodes connected to 
these nodes through turned on transistors. The search 
through the network stops when a non-conducting transistor 
or an input is reached. For each marked node, two calcula- 
tions are made. First, a charge sharing calculation is 
performed to model changes of state due to the charging and 
discharging of node capacitances. Second, a final value 
calculation is done to determine the node's ultimate logical 
state. 

A given node can have only two events pending: (1) a 
charge sharing event describing an immediate change in the 
node's state due to charge redistribution among the nodes on 
the connection list, and (2) a final value event describing 
the final, driven state of the node. RNL observes the 
following rules for processing events: (1) when a new charge 
sharing event is scheduled, throw away all previously 
pending events for the node, and (2) when a new final value 
event is calculated, it will be ignored if (a) there is a 
pending final event for the same value which is scheduled to 
occur sooner, (b) there is a pending charge sharing event 
for the same value as the new final event, or (c) there is 
no charge sharing event and the new final value event is the 
same as the node's current value. These rules are based on 
the assumption that the event that was last calculated 
reflects the latest configuration of the network and there- 
fore should override events calculated earlier. Charge 
sharing events discard any pending final value events 
because any charge sharing calculation is immediately 
followed by a new final value calculation. 
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These event rules, however, sometimes lead RNL to 
generate incorrect results. This is especially true of 
signal driven circuits (circuits where inputs are applied to 
the source and drain of a transistor as well as its gate) 
and circuits that depend on the analog properties of the 
devices to predict the behavior of the circuit. For 
example, consider the first exclusive OB gate design for the 




pipelined adder in Figure 3. 1 This design has proven to 
function correctly at CMO, however, the RNL simulation shows 
this circuit failing. 

Starting in a state where A=0, B=1, and out=1, 

assume that the input A then transitions to 1. Initially 

Ql, Q3, Q4, and Q6 are on. When input A goes high, Q3 is 

turned off (no events generated) and Q2 is turned on, gener- 

ating a charge sharing event and a final value event for 
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Abar resulting in Abar going low. When Abar goes low, the 
still turned on Q6 is now trying to drive the output node 
low and the still turned on Q4 (RNL recognizes that it takes 
a finite amount of time for Q4 to turn off but does not 
recognize that n-channel transistors do not conduct high 
voltages well) is still trying to drive the output node 
high. The result is an output of X, the undefined state. 
Next, Q4 is turned off. Since turning off Q4 adds no new 
nodes to the network, the event list is empty and the output 
remains at X. The primary difficulty RNL has with this 
circuit centers around the fact that the output node is 
controlled by two nodes that can change at different times. 
As a result, a charge sharing event due to one input can 
eliminate a final value event of the other, with that final 
value event being the force which determines the circuit's 
actual behavior. 

The circuit cf Figure 3.2 is a proven latch design 
which also fails in SNL simulation. In Figure 3.2 the frac- 
tions next to the transistors represent the length to width 
ratios of the devices. This circuit is dependent on these 
ratios fcr proper operation. These ratios insure that the 
gain of the input signal on the gates of Q5 and Q6 is 
greater than the gain of the feedback signal to the same 
gates. RNL does not recognize the difference in these gains 
to be sufficient to cause the gates of Q5 and Q6 to be at 
either logical 1 or 0 when the input signal is the opposite 
of the feedback signal. As a result, the circuit becomes 
locked up at X. Because of RNI's difficulty with these two 
circuits, other designs were employed in the final adder 
(see chapter 5) to facilitate testing of the overall design. 

To use RNL as installed at NFS, the following steps 
should be followed. First label the circuit and generate 
basename. cif as before. Again the program Mextra is used to 
extract the circuit, this time with the -o option (Mextra 
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Figure 3.2 CMOS Latch Design [Ref. 6]. 



basename -o) . The -o option 
capacitances. A follow on prog 
performs this computation with 
be noted that there are three 
programs, each named Mextra. 

OCB version and the UR modified 
to be used in the sequence, P 
format of the MIT version and 
At NPS, the UCB version is inst 
and US modified UCB versions 
parameters in a transistor 
Annaratone at CMU developed a p 
.sim file generated by the UCB 
However, cformat does not work 
Mextra. To avoid a loss of 
manually be changed to the US 



causes Mextra not to compute 
ram in this sequence, Presim, 
greater accuracy. It should 
different circuit extraction 
There is the MIT version, the 
UCB version. The next tool 
resim, can accept the output 
the US modified UCB version, 
ailed and was used. The MIT 
differ in the order of the 
specification. Professor 
rogram, cformat, to change a 
version to the MIT format, 
if the -o option is used with 
accuracy, the .sim file can 
modified UCB format. The 
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first step in this format 
editor to add ’’format: UC 

name.sim. The other chang 
change the labels for the n 
n e”. Using the EX editor, 
this : 

% e basename.sim 

: g/ n/s//e/g 

: w 

: g 

The next step is 
from basename.sim using Presim 
command : 

% presim basename.si 



change is to use the 71 text 
" to the header line of base- 
that needs to be made is to 
channel transistors from "n” to 
the following steps accomplish 

- invokes the editor 

- make global change 

for all n as first char 
in a line, change to e 

- write back edited file 

- exit editor 

create a binary file for RNL 
This is done by issuing the 

basename config 



Basename.sim is the edited .sim file and basename is the 
file into which presim writes its binary output. Config is 
the calibration file used to select other than default 
values for the circuit element capacitance and resistance. 
A copy of the presim user’s guide from the UW/NRC VLSI 
Consortium release 2.0 and the calibration file used in 
simulating the adder are contained in Appendix C. The 
values used in the calibration file are taken from the MOSIS 
supplied electrical parameters. 

The final step is to run RNL itself. This is done 
by entering one of the following two Unix commands: 

% rnl or 

% rnl cmdfile 



where cmdfile is the name of a file containing a sequence of 
RNL commands. Entering the first Unix command will cause 

RNL to take its commands directly from the console 
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the second Onix command is used. 



interactively. If 
fying a command file, RNL f 
cmdfile and upon completion 
console. In either case, 
commands : 



speci- 

irst executes all the commands in 
, starts taking commands from the 
RNL should be given the following 



(load "uvstd. 1”) 

(load "uwsim. 1") 

(read- network "has ename") 



where basename is the file generated by presim. The first 
two commands load RNL with several macros which simplify 
user interfacing with RNL. 

The user interface with RNL is a LISP interpreter. 
The interpreter continuously executes the loop: (1) read a 

command, (2) evaluate the command and perform the specified 
actions, and (3) print the result. There are two formats 
for specifying commands to this loop. The first is: 
(function argument argument ... argument) 



Here the parentheses delimit the command and spaces separate 
the elements. The interpreter reads the entire command, up 
to the closing parenthesis, then the first element is inter- 
preted as a function and all the others as arguments. The 
arguments may be of the same command form, (function arg arg 
... arg). If the following command were issued to RNL, 

(* 12 (+ 2 2) (/ 14 7 ) ) 

RNL would respond by typing 96 (12*4*2). The other format 
for commands to RNL is 

(function ’(argument argument ... argument)) 

where the " ’ " indicates the guote special form which keeps 
its argument from being evaluated. For example, (+ 2 3) 
evaluates to 5, but ’(+2 3) is a string of three elements. 
When this second RNL command format is not used to represent 
an argument of another command (i.e. is not contained within 
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the parentheses of another command) , it may be written in 
the more natural form: 

function argument argument .... <newline> 

Tutorials on RNL are contained in the University of 
Washington/Northwest VLSI Consortium’s VLSI De sign Tool s 
Reference M anu al [Ref. 11]. There are two points concerning 
the Mextra, Presim, RNL simulation cycle a user should be 
aware of that are not brought out in the documentation. The 
first concerns the use of vectors in RNL commands. As 
evidenced in the tutorials of Reference 11 and the adder 
simulation results in Appendix D, vectors can be used to 
make the input and output of RNL less cumbersome and 
verbose. After the vector has been defined, a user will 

then want to assign values to it. The documentation shows 
the format of the vector value assignment command to be: 
(invec ' (vecname values)) 

However, the "values" field has its own specific format. 
The first character should be a 0 or a 1 indicating positive 
and negative numbers, respectively. The LISP interpreter 
will work with negative numbers but RNL will not accept 
negative numbers as logical inputs. The second character is 
a letter specifying the number base of the input vector (b 
for binary, h for hexadecimal) . For example, to assign the 
binary value +1010 10 to the vector vectone, the RNL command 
would be: 

(invec ’ (vectone OblOIOlO}) 

The other point concerns the location of input 
labels on the input pads. Nhen the entire chip is being 
simulated, the input labels are normally placed on the metal 
pads where the off chip leads are attached. Before an input 
signal from a bonding pad reaches the interior circuits of a 
chip it must pass through a resistor in an overvoltage 



42 



protection circuit. In the extraction and simulation 
process this resistor is viewed as an open circuit. 
Therefore, on input pads, the input label must be placed 
after the resistor in the signal path. 

With Caesar, Lyra, and RNL, a designer at NPS has 
the requisite CAD tools for the complete logical circuit 
design loop. With these tools circuits that are free of 
design rule errors and produce the desired logical results 
can be designed. The lack of SPICE somewhat restricts the 
designer’s ability to optimize speed, but there are several 
design techniques that can be employed to design chips that 
run fast. These will be covered in the next chapter. 
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IV. DESIGN OF THE ADDER 



As stated in the introduction, the primary goals of the 
adder design are to maximize throughput and to provide for 
testability. The adder is to he a pipelined adder. Every 
clock cycle it should accept as inputs two 16-bit addends 
(A 1 , the least significant bit, through A16 and B1, the 
least significant bit, through B16) and one carry-in (Cin) 
bit. It is desired to produce the 16-bit sum (SI ,the least 
significant bit, through Slo) and the carry-out (Gout) bit 
as quickly as possible. 3oth the number of clock cycles 
from input of the addends to the output of the sum and the 
duration of each clock cycle are to be minimized. A secon- 
dary consideration in the design is expandability. An 
expandable design is one that can easily be extended to 
produce a 32-bit or 64-bit sum utilizing the same circuit 
structures. In this chapter the logical design and layout 
design of the 16-bit adder will be presented. The equations 
presented in this chapter are taken or derived from equa- 
tions found in chapters three through six of The Logic of 
Comp uter A rithmeti c by Flores [ Eef . 12]. In these equations 
concatenation implies the logical AND, the symbol + implies 
the logical OR, and the symbol + implies the logical XOR. 

A. LOGICAL DESIGN 

In considering the speed spectrum of adders from a 
logical standpoint, at the fast end there is the table 
look-up. With 33 binary inputs and 17 outputs, this would 
require an address space of 2^ 17-bit words. With current 
technology this is not feasible. At the other end of the 
spectrum is the serial adder. On clock cycle 1 it uses Al, 
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B1, and Cin to produce 51 and Clout (carry out of tit one 
into tit 2). On clock cycle 2 it uses A2, E2, and Clout to 
generate S2 and C2out. Here 16 clock cycles elapse before 
the sum is available. An adder can also be implemented as a 
ripple carry adder where the duration of each clock pulse is 
sufficient to allow a carry into the sum to propagate all 
the way through to a carry out. In the case of the 16- bit 

adder, this would reguire a clock duration at least sixteen 
times the length of the gate delay of the one bit adder. 
The middle ground belongs to the carry look-ahead adder 
£Ref. 3]. In carry look-ahead (CIA) addition the carry into 
each bit position, C (i) , is generated from the propagate, 

p (0 w a (>)O b I‘) (egn h.1) 

<M0 = >»(.) B (.) < e ^ n 4 - 2 ) 

P(i), and generate, G (i) , primitives. P (i) =1 implies that 
a carry into bit(i) will, be propagated through to bit (i+1). 
G(i) =1 implies that A (i) and B (i) will provide a carry 
into bit (i+1) of the sum, regardless of the contents of the 

c (i) ~ g (.-i) +g (.-2)-P(.-i)+ +C, n P[,-i) • ■ ■ / > (j)P(i) (eqn 4.3) 

s (0 = c (.')©^(.) < e 2 n 4 - 4 ) 

less significant bits of A and E. The algorithm for the CLA 
sum generation is as follows. The first event is the evalu- 
ation of eguations 4.1 and 4.2 to generate the P (i) and G (i) 
primitives. The second event uses the P(i) and G (i) primi- 
tives as inputs to equation 4.3 to generate the C(i) 's. The 
final event is the computation of the S (i) 's from equation 
4.4 . 
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As pointed out by Flores [Eef. 12] and by Conradi and 
Hauenstein [Etef. 3], there are several logical implementa- 
tions of carry look ahead addition. A principal task of 
this thesis investigation was to select a fast logical 
design. Without the circuit simulator Spice, the analysis 
of each design considered was more qualitative than quanti- 
tative. In this qualitative analysis, a turned on tran- 
sistor is considered as a resistor with its resistance 
proportional to its length and inversely proportional to its 
width. All gates driven by such a turned on transistor are 
considered to be capacitive loads with capacitance propor- 
tional to the area of the gate. The interconnect wiring is 
considered to add both parallel capacitive loading and 
series resistance as shown in Figure 4. 1 




Rwire 
AV- 

Cgate n 



Figure 4.1 CHOS Output Loading Model. 

From this model it is obvious that the amount of inter- 
connect wiring and the number of gates driven (fanout) 
should be minimized to minimize the output transition time 
when the positions of switches Si and S2 of Figure 4. 1 are 
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following guidelines in the 



reversed. This led to the 
design of the adder: 

1) The internal logic of each stage should he accom- 
plished with minimum dimension transistors , 3 microns 
x 4 microns (length x width) . This leads to more 
compact circuits with shorter int erconn actions and 
reduces the capacitive load on the preceding stage. 

2) Significantly wider transistors (3-micron x 9-micron) 
should be used at the output of each stage where the 
fanout and interconnect leading is greater. 

3) The fanout of any transistor should he kept to less 
than five. 

This requires a more complete definition of fanout 
because the capacitive loading of a gate depends on its 
area. A 3-micron x 4-micron transistor driving six other 
3-micron x 4-micron transistors has a fanout of six. A 
3-micron x 8-micron transistor driving the same load is 
considered to have a fanout of three. Though this implies 
that a high fanout problem can be solved by merely 
increasing the width of the driving transistor, it neglects 
the effects of the interconnect wiring. As gates are added 
to the load of a transistor, each subsequent addition must 
be more remote from the driving transistor. Since the 
resistance of the wiring is proportional to its length and 
inversely proportional to its width, the resistance of the 
wiring will increase unless the width is also increased. 
However, since the capacitance cf the wiring is proportional 
to its area, most of the gain achieved by widening the wire 
to reduce resistance is offset by the increase in capaci- 
tance. As a result, in the design of the adder, increasing 
the width of the driving transistor was not viewed as a 
complete fix for a fanout problem. 

For the comparison of the different approaches to CLA 
addition, the term logical eveDt needs to be defined. The 
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most basic definition is a combinational logic circuit 
accepting a set of inputs / performing its specified opera- 
tions on those inputs and generating a set of outputs. 
Therefore, the input of the addends, followed by the compu- 
tation and output of the sum can be considered as a logical 
event. However, a primary design consideration for the 
adder is to provide for testability and a key element of 
this provision is the availability of intermediate results 
(see section 3 of this chapter). This implies breaking up 
the sum generation into several separate events. The first 
event takes the addends as inputs, performs some logic oper- 
ation (s) on them and stores the results in a register. The 
next event takes its inputs from that register and stores 
its results in another register. This chain continues until 
the last event deposits the sum on the output pads of the 
chip. To provide the tester with easily interpreted inter- 
mediate results, the equations presented in this chapter 
were taken as boundaries for each logical event. The terms 
on the right side of the equation determine the inputs and 
the left side terms determine the output of a logical event. 
Once all the inputs for an equation are generated by 
previous events, the logic of the equation becomes part of 
the current event. 

1 . Zero Le vel CIA Logi c 

This logic requires three events to generate the 
sum. First, equations 4.1 and 4.2 are used to generate the 
P(i)’s and G(i)'s. Second, from equation 4.3 the C(i)'s are 
generated. Finally, the sum is derived from equation 4.4 
The principal problem with this approach for a sixteen-bit 
adder lies in the application of equation 4.3 Here, the 
input P (1) has a fanout of 15, which makes this approach 
unsatisfactory. 
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2 . 



Fir st Level CIA Log ic 

Noting that a four-hit sum generated using zero 
level CLA logic is within the design guidelines suggests 
cascading 4-bit slices of the same logic as indicated in 
Table 2 Here the sum is available after six events and the 



TABLE 2 

First Level CLA Logic for a 16-bit Sum 



Event 


Bits 


Bits 


Bits 


Bits 


No. 


1-4 


5-8 


9-12 


13-16 


1 


Compu te 
P(i) ,G(i) 


Compute 
P(i) , G (i) 


Compu te 
P (i) / G (i) 


Compu te 
P (i) ,G(i) 


2 


Compu te 
C(i) 


Delay 
P (i) /G (i) 


Delay 

P (i) /G (i) 


Delay 
P (i) /G (i) 


3 


Compute 

s(i) 


Compute 
C (i) 


pelay . 
P(D ,Gli) 


Delay 
P (i) • G 1 i) 


4 


Delay 

S(i) 


Compute 
S (!) 


ComDU te 
C (i) 


Deiav 

? (i) ,G*(i) 


5 


Delay 

S(i) 


Delay 
S (i) 


Compute 

S(i) 


Compute 

C(i) 


6 


Delay 
S (i) 


Delay 

S(i) 


Delay 

S(i) 


Compu te 
S(i) 



fanout is reduced by a factor of four. The event cycle time 
reduction would more than make up for the event count 
increase since cycle time grows faster than linearly with 
fanout. The only drawback with this design lies in the cost 
of extending it to generate 32-bit or 64-bit sums. For 
every 4-bit slice added, another event is required. Thus, a 
64-bit add would require 12 events. 

3. Second Level CLA Log ic 

Again the data is divided into 4-bit slices called 
blocks. But rather than let the carries ripple through the 
blocks, two new primitive functions are introduced. They 
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are the block propagate, BP(i), and block generate, BG(i) , 
functions. 3P(i) = 1 implies that a carry into block (i) will 
be propagated through to block (i+1). 3G(i)=1 implies that 

block (i) will generate a carry into block (i+1). For a 4-bit 
block where bit(1) is the least significant bit. The BP and 
BG primitives are generated by equations 4.5 and 4.6 respec- 
tively, with the ?(i)’s and G(i)*s computed as before. 

BP (<) = p (*) p {>) p W p (t) (eqn 4.5) 

BG {i) = G (*)+ G W p (*) +G m p {*) p W +G (D p (*) p w p r-) ( e ^n 4.6) 



Next, the block carry, 3C (i) , which represents the carry 
from block (i) into block (i+1), is computed using equation 
4.7 which represents the same logic as equation 4.3 



BC V E 

i = o 



• -1 



BG i k ) 



(eqn 4.7) 



So far, after three events, the ? (i) 's, G (i) 's, 
BP(i)*s, BG (i ) ’ s, and BC(i)'s have been generated. If the 
same method of generating the final sum as used in zero 
level CIA were to be used, two additional events would be 
required. The first again applies the logic of equation 4.3 
to each 4-bit block to generate the carry into each bit. 
Here the Cin for block (i) is given by BC(i-l). The second 
cycle is used to generate the sum from the C(i) 's and 
P (i) 1 s. One of these events can be eliminated if, while the 
BC(i)'s and their predecessors are being computed, an esti- 
mated sum of the 4-bit block is also computed. One method 
is to compute two estimated suras for each block, one 
assuming an carry into the block of 0 and the other assuming 
a carry in of 1. When the correct carry in for block (i) is 
generated, it is used to multiplex the correct sum for the 
block to the output. This assumed carry method was rejected 
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because of the large amount of area consumed by the regis- 
ters needed to hold two possible answers. The second method 
is to compute the estimated sum of the block assuming a 
carry-in of 0 and then correcting the estimated sum once the 
actual carry-in to each block is known. 

Since the estimated sum, ES (i) , is not needed until 
after the third event and computing it as one event again 
leads to fanout problems, the computation of ES(4), the most 
significant bit, through ES { 1) is computed in two events as 
follows. First, an intermediate estimated sum, IES (i) , is 
computed using two-bit slices, each assuming a 0 carry in 
(see equations 4.8 through 4.11). At the same time, a carry 
from bit (2) into bit (3) (IC23) is computed using equation 
4.12 On the next event, ES (i) is computed from the IES(i)'s 
and IC23 using equations 4.13 through 4.16 . 



j*j 

Co 

II 


(eqn 4.8) 


/£5 ( .) = £(2)0 G (\) 


(eqn 4.9) 


1ES( j) = P (j) 


(eqn 4. 10) 


IES (*)= P [*)Q G W 


(eqn 4. 11) 


1C 23 = G ( 2 )+ G (j)-P ( 2 ) 


(eqn 4. 12) 


Co 

II 

1*5 

Co 


(eqn 4. 13) 


ES ( 2) = /£5 ( 2 ) 


(eqn 4. 14) 


ES (jj = IC 23^) IES (j) 


(eqn 4. 15) 


£S (4) = [/£S (j) /C23]Q/£S (4) 


(eqn 4. 16) 



5 1 



Now, after three events, estimated sums for each 
4-bit block and the actual carry into each block (Cinb) are 
available. From these the sum can be computed using equa- 
tions 4.17 through 4.20 . 



•5(i)- Cm (~)ES^) 


(eqn 


4. 17) 


•5(3) = £5(i)j©£5(j) 


(egn 


4. 18) 


5(j) = ^5(])£5( 2 jj ©£T5 (S) 


(eqn 


4. 19) 


5( 4 ) = |c in * £5(,)£S( 2 )£S|j)jQ£S( 4 ) 


(eqn 


4. 20) 


level CLA logic, the 


1 6- bi t 


sum 



is 

generated in only four events. Additionally, this design 
can easily be extended to the generation of 64-bit sums. 
The logic of equations 4.5 and 4.6 which produced the second 
level primitives BP and BG can be used again to generate 
third level primitives, B3P ard 33G. These third level 
primitives represent the carry propagate and carry generate 
properties of 16-bit slices. The carry into each 16-bit 
block is provided by implementing equation 4.7 . Thus, 
adding one event will provide the carry into each of four 
16-bit blocks of a 64-bit sum. The logic of equation 4.3 is 
then used to generate the carry into each 4-bit block of the 
sum and the final sum is computed as before. The final 
result is that by adding two events, for a total of six, and 
using the same logic as before (i.e. no new circuits need to 
be designed), the 16-bit adder can be extended to a 64-bit 
adder. 
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B. DESIGN FOR TESTABILITY 



Another primary objective of the adder design was to 
provide for testability, that is, the ability to logically 
detect fabrication errors or circuit malfunctions rather 
than visually searching for faults with a microscope. 

As the complexity of integrated circuits has grown, the 
ability to logically detect faults using only the normally 
available inputs and outputs has decreased markedly. As 
complexity increases, the number of likely faults to be 
tested for and the number of input vectors required to 
isolate a specific fault grow rapidly. Unless a design 
technique is used which allows the tester to examine the 
interior logic of a chip , the order of magnitude of the 
number of input vectors required to perform useful logical 
testing is prohibitive. Thus, if logical testability is 
desired, a design technique that provides for it must be 
used. 

One such design technique is level sensitive scan design 
(LSSD) [Ref. 13]. Level sensitive implies that the output 
of any logic element is dependent only on the levels of its 
inputs. No logic elements are allowed to depend on a tran- 
sition such as in an edge triggered flip flop. Scan design 
implies that all memory elements in the design are to have 
an auxiliary function where their contents are serially fed 
to an output pad for examination. This gives a tester the 
ability to examine intermediate results. In applying the 
LSSD technique to the adder design, the following steps were 
taken. 

First, all circuits were designed to respond to the 
level of their inputs and not to require a transition to 
trigger their operation. Second, to insure that each logic 
event worked only with stable, non-fluctuating input levels, 
the inputs to each event were gated. The input gates were 
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opened only after the inputs were stable (i.e. the outputs 
of the previous event were stable) and closed before the 
input gates of the previous event were opened. Third, a 
dual mode latch was used to store the output of each logic 
event. In the normal mode of operation, the register 
latches the outputs of one logic event in parallel and 
stores them to be used as inputs for the next logic event. 
In its secondary mode of operation, the register stops 
taking its parallel inputs and starts to run as a shift 
register, shifting its contents onto an output pad. 

One of the conseguences of using the LSSD technigue is 
the large amount of area consumed by the dual mode regis- 
ters. In high speed operation, an inverter pair would be 
sufficient to store inter-event results. But to permit low 
speed testing where the capacitance of a gate may discharge 
during one clock phase, and provide the dual mode feature, a 
pair of clocked latches with control circuits is required. 

C. LAYOUT DESIGN 

With the logic decided upon, the next step was to create 
the layout of the adder. The logic consisted of four events 
to produce the sum. Another event was needed to latch the 
input data onto the chip. A two-phase clock was needed to 
insure that two adjacent events did not run simultaneously 
(insuring stable inputs to each event) . To make the output 
of the adder compatible with the input to another adder, a 
one event delay was added. This insures that the output of 
one adder does not change while a second adder is using the 
sum from the first as an input. With two 16- bit addend 
inputs, one carry-in input, one power supply (Vdd) input, 
one reference (GND) input, a 16- bit sum output, one carry- 
out output, and two clock inputs, ten pads were left from a 
standard 64-pin chip for register mode control input and 
register (shift mode) output. Since the design called for 
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five registers, one for each logic event and one for 
latching the input data, five pads were used for input of 
the register mode control signals and five were used for the 
registers to serially output their contents. With the 
required inputs and output identified, the preliminary floor 
plan shown in Figure 4.2 was created. 
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Figure 4.2 Preliminary Chip Floorplan. 
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The first circuit designed was the dual mode latch of 
Figure 4.3 Here the circuit is designed to latch the IN 
level when Control is low (Control is high) and phil is high 




(phil is low). When phil goes low, a copy of the input is 
also stored in the second latch and becomes available at 
shift-out which is connected to shift-in of the next latch. 
When control goes high, the IN signal is blocked and the 
latch takes its input from the register to the left. The 
shift-in of the leftmost latch in a register is tied to 
ground. Versatec plots of the actual layouts of this dual 
mode latch and the other circuits described in this section 
are given in Appendix E. 

The ,AND gate used was corstructed from a NAND gate 
followed by an inverter as shown in Figure 4.4 Similarly, 
the OB gate was constructed from a NOR gate followed by an 
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inverter (see Figure 4.5). Although logic implemented using 
these AND and OR gates is more area consuming than the same 
logic implemented in NAND and NCR gates only, the penalty is 
not severe because they were used irfreguently in the final 
design. 




Figure 4.4 AND Gate. 




Figure 4.5 OR Gate. 

The exclusive 02 gate (XOE) was constructed from two 
inverters and three NAND gates as shown in Figure 4.6 . 
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Though this design is considerably more area consuming than 
the XCE gate of Figure 3.1, it was selected because the RNL 
circuit simulator could correctly model its operation. 




Figure 4.6 Exclusive OR Gate. 

More complex logic functions were implemented using 
programmed logic arrays (PLA) where the outputs are the 
logical sum (OR) of the products (AND) of inputs. A single 
phase design was needed. A PLA designed to compute when 
phil is high, between the time the preceding event had 
produced stable outputs (ohi2 going low) and the time phil 
goes low, had to produce the proper sum-of -product s results. 
To hold down fanout, a dynamic structure was needed so that 
inputs could be applied to a single type of transistor. To 
prevent steady state power consumption a precharged dynamic 
structure was needed. Because of charge sharing, the prec- 
harging must take place while the inputs are present on the 
transistor gates of the PLA (see chapter 5, section C, for a 
complete explanation of the charge sharing problem in this 
PLA structure) . Thus, two distinct events must occur during 
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this time period. First, the inputs must be applied and 
precharging must take place. Then evaluation must occur. 

To cause these two events to occur during a single phase of 
the clock, the inter-phase time when both phil and phi2 are 
low must be utilized for precharging. The basic structure 
of the resulting PLA is shown in Figure 4.7 




Figure 4.7 PLA Structure. 

Beferring back to the flocrplan in Figure 4.2, the 
layout of the circuits which perform the logic of each event 
are presented in Appendix E. The names assigned to the 
layouts are given below. Event 1 consists of a 33-bit dual- 
mode latch. Event 2, which computes the P and G primitives 
for each bit, is made up of 16 AND gates, 16 XOE gates, and 
another 33-bit latch. Event 3, which computes the BP and BG 
primitives. The IES (i) ’s and the IC23 for each 4-bit block, 
is made up of four instances cf PLA82 and a 29-bit latch. 
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The circuit PLA82 is made up of an 3-input, 5-product, 
2-output P1A , two XOR gates, ore AND gate, and one OR gate. 
Event 4, which computes the ES(i) ’s and BC for each 4-bit 
block uses four instances of E1A84 to compute the ES(i)’s 
and one instance of PLA915 to compute the BC (i) 's and a 
21-bit latch. The circuit P1A915 is a 9-input, 15-product, 
5-output PLA and the circuit P1A84 is an 8-input, 7-product, 
4-output PLA. Event 5 uses four instances of P1A104 to 
compute the S (i) ’s and a 17 bit latch to store results and 
provide the added delay (by taking the output from the shift 
out position, the extra clock cycle of delay is generated). 
The circuit PLA104 is a 10-input, 14-product, 4-output PLA. 
With this design, the input to output latency is three full 
cycles of a two-phase non-overlapping clock; three cycles of 
the clock elapse between the time the addends are presented 
to the chip and the time the sum becomes available at the 
output. In the first three registers the odd number of bits 
is due to the need to store the carry-in value until event 
4. In the last two registers the odd number of bits is due 
to the need to store the computed value of carry-out. 

The resulting final layout of Figure 4.3 shows the 
actual on-chip layout locations of each event’s logic. In 
addition to the logic circuits for each event, the circuits 
AMP and AMP5 are also seen. These are driver circuits for 
the high fanout control and clock signals. Each takes as 
its input a control signal and produces as outputs the 
control signal and its inverse, both driven by 3-micron x 
160-micron transistors. This amplifier is the same design 
used by the output pads to drive off chip loads. 

This final layout represents one implementation of a 
pipelined CLA adder designed for testability. The relative 
merits of this design and others that may have been imple- 
mented can, as yet, only be g ualitati vely discussed. The 
addition of SPICE 2G7 to the CAE toolbag will provide future 
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Figure 4.8 Final Layout. 



CMOS designers 
make decisions 
objectives. 



with the quantitative 
involving tradeoffs 



analysis necessary to 
among primary design 



This final design, when simulated using RNL, functioned 
properly at clock speeds up to 14 megahertz. Testing of the 
actual chips produced by MOSIS should give an indication of 
the accuracy of RNL’s predictions. The following chapter 
presents a test plan to check for proper operation of the 
adder at low clock rates and tc determine the maximum oper- 
ating speed. 
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7. TEST PLAN 



After several iterations of the design-simulate- redesign 
loop, a final layout was achieved for the 16-bit pipelined 
adder. These iterations provide considerable confidence in 
the logical correctness of the layout. Appendix D contains 
ENL simulation results for the full adder. In reading these 
results it should be kept in mind that the adder requires 
three cycles of the two-phase clock to produce the sum. In 
the first part of the simulation, the inputs were kept 
constant for three clock cycles to facilitate easier reading 
of the results. With these steady inputs, simulations were 
run to verify the generation of correct sums, concentrating 
on those addends that would produce carry propagates and 
carry generates across the boundaries of the 4-bit blocks. 
The last part of the simulation utilized different inputs 
each clock cycle. This was done to test the pipelining 
feature of the design, insuring no dependence on repeated 
inputs of the addends to produce the proper sum. 

After fabrication of the chip, application of similar 
inputs to make the same determinations for the actual 
circuits will form the initial portion of the test plan. In 
this chapter a test plan for the verification of computa- 
tional correctness and speed will be presented. 

A. INPUTS AND OUTPUTS 

The first step in testing the chip will be to connect it 
to the required input and output circuitry. To accomplish 
this, the identity of the inputs and outputs on each pin 
must be determined. Microscopic examination of the chip 
will reveal the logo "16-bit Add", located between the GND 
and Vdd buses for the pads in the northeast corner (see 
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Figure 4.8 which is repeated 
this landmark, the signals 
follows. 



below for convenience). Using 
on the pads can be labeled as 




Figure 4.8 (repeated) Final Layout 
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The western edge has sixteen input pads for the addend 
A, with the least significant bit, A (1) , located at the 
northern end. The northern edge of the chip also has 
sixteen input pads for the addend 3, with the least signifi- 
cant bit, B{1), located at the eastern end. The southern 
edge has fourteen output pads and two input pads. At its 
western end is the GND input pad followed by fourteen output 
pads for S(16), the most significant bit of the sum, through 
S (3) . Following S ( 3) , at the eastern end is the input pad 
for Vdd. The eastern edge of the chip has eight input pads 
and eight output pads. Starting at the northern end, there 

are input pads for phil, phi2, Cin, CONI (control signal for 

the dual mode register of event 1), C0N2, C0N3, C0N4, and 

C0N5. They are followed by output pads' for SREG1 (serial 
output from dual mode register of event 1), SREG2, SREG3, 
SR EG 4, SREG5, Cout, S (2) , and S (1) at the southern end. 

To supply power to the chip, +5 volts DC should be 

applied to the Vdd pad and 0 volts to the GND pad. All 

logical inputs including clocks and control signals should 
be either GND for a logical 0 or Vdd for a logical 1. 
Simulation with RNL revealed some restrictions on the clock 
signals. For proper operation, each clock should remain 
high for a minimum of 20 nanoseconds and the clock inter- 
phase time, when both phil and phi2 are low, must be at 
least 10 nanoseconds in duration. For initial testing, to 
insure that charge sharing problems caused by too short an 
interphase time, and fanout problems caused by too short a 
clock phase duration, are not interpreted as fabrication 
errors, the clock speed should be adjusted so that both 
above clock parameters are exceeded by one order of 
magnitude. 

The outputs, like the inputs, are at Vdd to represent a 
logical 1 and at GND to represent a logical 0. The circuits 
used to measure the outputs should have high input 
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impedance, on the order of one megohm. The output pads of 
the adder are not designed to handle the current source and 
sink reguirements of transistor-transistor logic integrated 
circuits. The output measurement circuits should be 
constructed using NHOS or CMOS devicesthat are designed to 
operate between +5 vclts DC and ground. 

B. TESTING FOB CORRECT OPEBATI CN 

After connecting the adder to a test harness, the next 
step is to verify the generation of correct sums by the 
adder. There are several inputs that should be included in 
the testing to verify the correct operation of individual 
circuits. These are contained i-n Appendix F. In addition 
to the test vectors of Appendix F, several randomly selected 
input vectors should he tested. If the adder should fail to 
generate correct sums. The LSSE features can be employed to 
examine intermediate results. 

1 . Intermediate r esults 

With the LSSD design, a tester can leave input 
levels constant for a long period of time and use the shift 
mode of the internal registers to examine the internal state 
of the chip. The rightmost bit of each register is always 
available at the output pad for that register. To obtain 
the contents of the other bits, the control signal for the 
given register is set to and held at logical 1 while the 
clock continues to run. For registers 1, 3, and 5 the 
serial output will be meaningful and stable while phi2 is 
high. The serial output of registers 2 and 4 will be stable 
when phil is high. Table 3 lists in order the intermediate 
values available at the SREG (n) output pad when the input 
CONn is high. 
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TABLE 3 

Register Serial Outputs 



Clock 

Cycle 


SREG1 


SREG2 


SRSG3 


SREG4 


SRSG5 


0 


B 1 


PI 


BP 1 


Cin 


S 1 


1 


B2 


P2 


IES3 


BC2 


S3 


2 


B3 


P3 


IES4 


Cout 


S5 


3 


B4 


P4 


BG2 


ES2 


Si 


4 


B5 


P5 


IES5 


ES4 


S 9 


5 


B6 


P6 


IES6 


ES6 


S11 


6 


B7 


P7 


IC67 


ES8 


S 13 


7 


B8 


P8 


BP3 


ES 1 0 


S 15 


8 


B9 


?9 


IES 1 1 


ESI 2 


0 


9 


BIO 


P10 


IES12 


ESI 4 


Cou t 


10 


B 1 1 


P 1 2 


BG4 


ES 1 6 


S2 


1 1 


B 1 2 


P12 


IES13 


BC 1 


S4 


12 


313 


P 13 


IES 1 4 


BC3 


S6 


13 


314 


P 1 4 


IC1415 


ESI 


S8 


14 


315 


P 1 5 


BG 1 


ES3 


S 10 


15 


316 


P 1 6 


I ES 1 


ES5 


S 12 


16 


A 1 


G 1 


IES2 


ES7 


S 14 


17 


A2 


G2 


IC23 


ES9 


S 16 


18 


A3 


G3 


BP2 


ESI 1 


0 


19 


A4 


G4 


IES7 


ESI 3 


0 


20 


A5 


G5 


IES8 


ESI 5 


0 


21 


A6 


G6 


BG3 


0 


0 


22 


A7 


G 7 


IES9 


0 


0 


23 


A8 


G8 


IES10 


0 


0 


24 


A9 


G9 


IC101 1 


0 


0 


25 


A10 


G 1 0 


BP4 


0 


0 


26 


All 


G 1 1 


IES 1 5 


0 


0 


27 


A 1 2 


G 1 2 


IES16 


0 


0 


28 


A 1 3 


G 1 3 


Cin 


0 


0 


29 


A 1 4 


G 1 4 


0 


0 


0 


30 


A 1 5 


G15 


0 


0 


0 


31 


A16 


G 1 6 


0 


0 


0 


32 


Cin 


Cin 


0 


0 


0 


33 


0 


0 


0 


0 


0 


34 


0 


0 


0 


0 


0 



C. TESTING FOR SPEED OF OPERATION 

Once the chips containing fabrication errors have been 
culled from the chip set returned by MOSIS, the task 
remaining is to determine just how fast the adder can run. 
Rather than simply increasing the clock rate until the adder 
fails, the duration of the time both phil and phi2 are high, 
and the interphase time should reduced separately. RNL 
simulation indicates that the circuit which generates S4 
within PLA104 is the limiting circuit for clock phase dura- 
tion (i.e. it reguires the longest time to correctly 
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evaluate its inputs). RN1 simulation also indicates that 
the circuits in PLA 104 which generate S 1 and S 4 are the 
limiting circuits for the clock interphase duration. 

Since the PLA is constructed of precharged dynamic 
circuits, the evaluation clock phase must be long enough to 
allow the inputs to drive the outputs to their proper 
values, even if the inputs are the same as those of the 
previous evaluation cycle. This allows the tester to use a 
constant input as the duration of each clock phase is 
reduced until the adder produces incorrect results. 

Determination of the clock interphase duration limit is 
more difficult. This is because the inputs to a PLA must be 
changing to cause charge sharing problems to occur. For 



ini in2 




Figure 5. 1 Charge Sharing in a PLA. 

example, in Figure 5.1 assume that the first set of inputs 
is in1=1, in2=0, and that this is correctly evaluated to 
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produce out=0 when phil is high. Now assume that the next 
input is in1=0 and in2=1, which should also evaluate to 
out=0. However, if the precharge time (when the inputs are 
present on the gates of Q2 and £3 and phil is still low) is 
insufficient, C2 will not be charged to Vdd when precharging 
ends {C2 was discharged to zero volts during the previous 
evaluation when ini was high and phil was high). Now, when 
evaluation begins (phil going high) the low voltage across 
C2 causes Q5 and Q6 to interpret their input as a logical 0. 
As a result the output of the Q5-Q6 inverter pair goes high, 
causing Q8 to turn on, discharging C4 and resulting in an 
output of logical 1, which is incorrect. Table 4 lists the 
proper evaluation seguence when precharge time is sufficient 
and the improper seguence due to insufficient precharge 
time. In this table, for the inputs, output, and capacitor 
voltages a 1 indicates Vdd, 0 indicates GND, and X indicates 
somewhere in between. For the transistors, a 1 indicates 
on, a 0 indicates off, and an X indicates neither fully on 



TABLE 4 

PLA Evaluation Seguences 



Proper 



evaluation 
phi in 
1 2 12 



seguence: 

CO 1 out 

12.34 1234567890 



Improper 



1 


0 


10 


0011 


1 100010C01 


0 


0 


0 


10 


00 11 


010 101 1 C01 


0 


0 


1 


01 


00 11 


010101 1C01 


0 


0 


0 


01 


0111 


001101 1C01 


0 


1 


0 


01 


0111 


1010010C01 


0 


evaluation sequence: 




Vi 


m 

12 


c 

1234 


T23 4567 890 


out 


1 


0 


10 


00 11 


1 100010C0 1 


0 


0 


0 


1 0 


00 1 1 


010101 1C01 


0 


0 


1 


01 


00 1 1 


010101 1C01 


0 


0 


0 


01 


0X11 


0011 01 1 C01 


0 


1 


0 


01 


oxxo 


1010XX0X10 


1 
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nor fully off. Subsequent inputs of in 1 = 0 and in2=1 may 
produce correct results since with constant inputs, each 
precharge time will add more charge to C2 until there is 
sufficient charge to allow the output of the Q5-Q6 inverter 
to remain low. 

Thus, to check for charge sharing problems in the 
circuit of Figure 5.1, the inputs must alternate. likewise, 
in P1A104 to check for charge sharing errors in output SI, 
its inputs must alternate between ES1=0, BC=0 and ES1=1, 
BC=1 as the interphase time is reduced. This can be accom- 
plished for all four instances of PLA104 simultaneously by 
alternating inputs of 

A = 0001 1001 1001 1001 
B = 0000 1000 1000 1000 
Cin = 1 

and 

A = 0000 0000 0000 0000 ' 

B = 0000 0000 0000 0000 
Cin = 0 

To check for charge sharing errors in S4, the inputs to PLA 
104 must cycle between BC=1, S4=0, S3=S2=1,S1=0 and 

BC=0, S4=0, S3=S2=S1=1 . This may be accomplished for all four 
instances of PLA104 simultaneously by alternating inputs of 
A = 0110 1110 1110 1110 
B = 0000 1000 1000 1000 
Cin = 1 

and 

A = 0111 0111 0111 0111 
B = 0000 0000 0000 0000 
Cin = 0 

This maximum speed testing assumes that RNL has correctly 
identified the slowest circuits on the chip. RNL 
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simulations have indicated that the next slowest circuit 
(PLA915) is at least 20% faster than PLA104 {16.0 nsec for 
PLA915 vs. 20.1 nsec for PLA1C4). Also, ail other PLA’s 
functioned properly with a 5 nsec interphase time. 

Should PLA104 prove to be the speed limiting circuit for 
the chip, the actual failure speeds of the chip can serve as 
an indication of the accuracy of the RNL simulation for 
future designs. 



VI. CONCLUSIONS 



The experience gained in the design of the adder coupled 
with the clarity of hindsight leads to the following conclu- 
sions and recommendations. 



A. THE CMOS TECHNOLOGIES 



The CMOS technologies will play a role of steadily 
increasing importance in the VLSI designs of the future. 
MOSIS is already offering, on an experimental basis, CMOS 
Bulk p-well fabrication with a one-micron minimum feature 
size. A scalable set of design rules, to allow initial 
fabrication in 3-micron CMOS fcr design verification before 
the far more expensive 1-microc process is used, is being 
developed. 

In the private sector there is considerable research 
aimed at finding an insulating substrate material that does 
not have the variability and thermal problems of sapphire. 
Progress in this area will remove the drawback caused by 
latchup tendencies in CMOS Bulk. 



B. CMOS CAD TOOLS 

Though the design tccls currently available at NPS consti- 
tute a complete set for the design of CMOS Bulk p-well 
circuits, the recent CAD tool set released by the 
University of Washin gton/North west VLSI Consortium, Release 
2.0 [Ref. 11], coupled with University of California at 
Berkeley Winter 1983 CAD tools, represents a more complete 
and cohesive set for CMOS design. When sufficient disk 
space on the V ax 11-780 becomes available to load the 
Release 2.0, implementation of the Release 2.0 CAD package 
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is highly recommended. An added benefit of installing the 
Release 2.0 package is the cell library provided. The 
library contains several basic standard cells with known 
performance charac teristics. The library also contains the 
standard pad frames used by MOSIS. Though MOSIS does not 
require the use of standard pad frames on designs submitted, 
their use does speed up fabrication. 

As mentioned earlier, as socn as SPICE 2G7 is available, 
its addition to the CAD toolbag would be most advantageous 
to a CMOS designer. 

C. DESIGN OF THE ADDER 

If the design of the adder were to be undertaken again, 
a different approach to generating the sum would probably 
have been used, especially if the new CAD tools mentioned 
above were available. The logic approach to the computation 
would still involve CLA addition, but it would be accom- 
plished using combinational logic and library ceils rather 
than PLA’s. Testability would probably suffer greatly, but 
effort would be made to reduce the sum generation tc two 
logical events. Though the level of testability provided by 
the current design should provide considerable insight into 
CMOS Bulk p-well performance and CAD tool accuracy, there 
would be no need to repeat the investigation. 
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APPENDIX A 

SPICE MODEL CABDS FOE 3-MICRON CHOS-PW DEVICES 

CMO models for MOSIS 3-micron CMOS Bulk p-well devices: 

Fast Models 

.model n nmos vto=0.4 tox=0.7e-7 lambda=1e-7 ld= 1e-6 

+xj=1.1e-6 gamma=.3 uo=500 cbd=5e-4 cbs=5e-4 

.model p pmos vto=-.4 tox=0.7e-7 lambda=1e-7 ld= 1e-6 

+xj=1.1e-6 gamma=.3 uo=300 cbd=3.5e-4 cbs=3.5e-4 

Slow Models 

.model n nmos vto=1.0 tox=0. 8e-7 lambda=1e-7 ld=.5e-6 

+xj=0.6e-6 gamma=1.3 uo=400 cbd=6e-4 cbs=6e-4 
.model p pmos vto=- 1.0 tox=G.8e-7 lambda=1e-7 ld=.5e-6 

+xj=0.6e-6 gamma=.9 uo=200 cbd=4.1e-4 cbs=4.1e-4 



MIT Models for MOSIS 3-micron CKOS Bulk p-well devices: 
Slow - Slow 

.model nss nmos level=2 r sh = 20 tox=650e-10 ld=.25e-o 
+xj=.35e-6 c j=6e-4 cjsw=4e-1C wo=475 vto=1.2 

+cgso= 1 .3e- 10 cgao=1.3e-10 nsub=1.5e16 

+ vmax=5e4 pb=.7 mj=.5 mjsw=. 5 

+neff=2.5 ucrit=8e4 uexp=.25 

.model pss pmos level=2 rsh=80 tox=650e-10 ld=.25e-6 
+xj=.35e-6 cj=4.1e-4 cjsw=2.5e-10 uo=190 vto=-1.2 

+cgso= 1 .3e-10 cgdo=1.3e-10 nsub=5e15 tpg=-1 

+ vmax=5e4 pb= . 7 mj=.5 mjsw=.5 

+ neff=2.5 ucrit=8e4 uexp=. 15 
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East p-type Slow n-type 

.model nfs nmos level=2 rsh=30 tox=600e-10 ld=. 25e-6 

+xj=.35e-6 cj = 6.0e-4 cjsw = 4.0e-10 uo=475 vt o = 1 . 2 

+cgso= 1 .9e-10 cgdo=1.9e-10 neub=1.5e16 

+vmax=5e4 pb=.7 mj=.5 m j sw=. 5 

+neff=2.5 ucrit=8e4 uexp=.25 

.model pfs pmos level=2 rsh=20 tox=600e-10 ld=.40e-6 

+xj=.60e-6 cj = 2.0e-4 cjsw=1.0e-10 uo=2 7 0 vto = -0 . 6 
+cgso=2.0e-10 cgdo=2.0e-10 nsub=0.3e15 tpg=-1 

+ vmax=5e4 p b= . 7 mj=.5 mjsw=.5 

+nef f=2. 0 ucrit=8e4 uexp=. 15 

East p-type Fast n-type 

•model nff nmos level=2 rsh=10 tox=550e-10 ld=.40e-6 

+xj=-60e-6 c j=3. Oe-4 c jsw = 2. 0 e- 10 uo=67 5 vto=0. 6 

+cgso=2.5e- 1 0 cgdo=2.5e-10 nsub=0.5e16 

+ vmax=5e4 pb= .7 mj=.5 m j sw=. 5 

+neff=2.5 ucrit=8e4 uexp=. 25 

•model pff pmos level=2 rsh=20 tox=550e-1Q ld=.40e-6 

+ xj=.60e-6 c j=2. 0e-4 cjsw=1.0e-10 uo=270 vto=-0 - 6 

+cgso=2 . 5e- 1 0 cgdo=2.5e-10 nsub=0.3e15 tpg=-1 

+vmax=5e4 p b= . 7 mj=.5 mjsw=.5 

+ neff=2.0 ' ucrit=8e4 uexp=. 15 
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Slow p-type 



Fast n-type 



.model nsf 


nmos level=2 rsh=10 tox=600e-1Q ld=.40e-6 


+ x j = . 60e-6 


cj=3.0a-4 cjsw=2.0e-10 uo=675 vto=0.6 


+cgso=2. Oe- 


-10 cgdo =2. Oe- 1 0 nsub=0.5e16 


+vmax=5e4 


pb= . 7 m j= . 5 mjsw=.5 


♦nef f=2. 5 


ucrit=8e4 uexp=. 25 


.model psf 


pmos level=2 rsh=80 tox=600e-10 ld=. .25-6 


+ x j=.35e-6 


cj = 4. 1e-4 cjsw=2.5e-10 uo= 1 9 0 vto =- 1 . 2 


+ cgso= 1 .2e- 


-10 cgdo=1.2e-10 nsub=5.0e15 tpg = - 1 


+vmax=5e4 


pb = . 7 mj=.5 mjsw=. 5 


+neff=2.0 


ucrit=8e4 uexp=. 15 
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APPENim B 

ONII MANUAL ENTET FOE EULEC 



RULEC ( CAE ) CAD Toolbox User 1 s Manual RULEC ( CAE ) 



NAME 

rulec — Compile design rules for Lyra 

SYNOPSIS 

rulec [ — Lo] rules 

DESCRIPTION 

Rulec is a shell script with the following processing steps: 

i) . The actual Lyra rule compiler is invoked to translate the symbolic rule 

description, rules.r , to lisp code, rules. L 

ii) The lisp compiler, Liszt , is invoked to compile rules. 1 to rules. o 

iii) ruZes.o is loaded into Lyra.proto to generate an executable lisp Lyra, 
rules . 

iv) The intermediate files rules. 1, and rules. o are deleted. 

The following options are supported: 

-1 (load only) No compilation is done. Previously compiled rules, rules. o, 
are loaded into Lyra.proto to generate an executable Lyra, rules. This 
option is useful mainly at Berkeley, where Lyra.proto changes frequently. 

— o (save object) Name. o is not removed. Enables 'rulec -1 rules' in the 

future. 

FILES 

~cad/bin/rulec — rulec shell script 

~cad/lib/lyra/Rulecl — lisp rule compiler 

~cad/lib/lyr a/Lyra, proto — Lyra sans compiled rules code. 

~cad/lib/lyra/*.r — standard rulesets. 

~cad/lib/lyra/DEFAULTS — gives default rulesets for Caesar technologies. 

SEE ALSO 

Lyra (CAB) 

Liszt (1) 

AUTHOR 

Michael Arnold. 
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APPENDIX C 



PRESIM USEE'S GUIDE 

Config file: used to calibrate ENL 
capm2a .00000 

capm2p .00000 



capma 


.00006 


capmp 


.00000 


cappa 


.00006 


cappp 


.00000 


capda 


.00010 


capdp 


.00060 


cappda 


.00010 


cappdp 


. 00 06 0 


capga 


.00057 



lambda 1.0 
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PRESIM User’s Guide 



UWINW VLSI Consortium 



Department of Computer Science 
Univeniry of Washington 
Seattle, WA 98195 



(This document is based on portions of the document 'User's Guide to NET, PRESIM and 
RNL/NL,' by Christopher J. Terman, Laboratory for Computer Science, MI.T„ Cambridge, MA 
02139.) 



One must first convert the .sim file to a network file suitable for use by RNL or NL - to do this 
we run PRESIM: 

presim foojim foo [config] options. ~ 

which con verts the file foo sim into a binary file for RNL/NL called foo. 

The *f option: 

Suppresses the sum-of-products formation. This may be desired if you think 
sum-of-products is formed wrong otherwise the advantages of the transistor and 
node reduction make this option unattractive. 



The -e option: 

•efilemm value 

writes a list of node names and capacitances to the specified file. Only capacitances larger than min- 
value will be included. 

The -t option: 

-tflle min value 

writes a list of transistors and RC values to the specified file - there are two entries for each transis- 
tor. The R’s come from the size of the transistor, Ci from the source/drain capacitance. Only RC 
values larger than nun value will be included. 

The -p option: 

•prcsist/voltage 

provides a worse-case estimate of the circuit power consumption by assuming that all the pullups 
(DEP or LOWP devices with drain»VDD) are all on simultaneously. 'Voltage' specifies the supply 

UW/NW VLSI Release 2 - 1 - KV1/83 
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UW/NW VLSI Consortium 



PRESIM User * Guide 



voltage, for example *-p5* specifies a VDD of 5 volts. The result is printed after PRESIM completes its 
other processing. When figuring the resistance of t pull up device the 'power' characteristic resistance 
as set in the config file is used. 

The optional third file (config) specifies various electrical parameters. The internal values (the 
defaults) are a generic set. They do not reflect any particular fabrication process. (UW-NW VLSI 
NOTE: A configuration file is provided in the source code that duplicates the internal settings as an 
example of how this file could be used. In addition we note that, the resistor values are stored first 
sorted by width, then by length not by the ratio. Values not explicitly provided in the configuration 
file are estimated by linear interpolation.) The format of this file is lines of the form 

parameter vofue comments 

Lines beginning with V are treated as ail comment. The parameter names and their default values 
are: 



; configuration file for ’standard* MPC process 



capm2a DOOOO 
captn2p £0000 
capma 00003 
capmp 00000 
cappa 00004 
cappp 00000 
capda 00010 
capdp 00060 
cappda .00010 
cappdp 00060 
capga 00040 

lambda 2-5 



lowthresh 0-3 
highthresh OS 

cntpullup 0 



diffperim 0 



subparea 0 



; 2nd metal capacitance — area, pf/sq-micron 
; 2nd metal capacitance - perimeter, pf/ mi cron 
; 1st metal capacitance — area, pf/sq-micron 
; 1st metal capacitance - perimeter, pf/micron 
; poly capacitance - area, pf/sq- micron 
; poly capacitance - perimeter, pf/micron 
; n-diffusion capacitance — area, pf/sq*micron 
; n-diffusion capacitance - perimeter, pf/micron 
; p-diffusion capacitance - area, pf/sq-micron 
; p-diffusion capacitance - perimeter, pf/micron 
; gate capacitance - area, pf/sq-micron 

; microns/lambda (conversion from -sim file units 
; to units used in cap parameters) 

; logic low threshold as a normalized voltage 
; logic high threshold as a nor mal ised voltage 

; < > 0 means that the capacitor formed by gate of 
; pullup should be included in capacitance of output 
; node 

; < >0 means do not include diffusion perimeters 
; that border on transistor gates when figuring 
; sidewall capacitance (•) 

; < > 0 means that poly over transistor region will not 
; be counted as part of the poly-bulk capacitor (•) 
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d iff ext 0 ; diffusion extension for etch transistor, i-e_, etch 

; transistor is assumed to hive t rectangular source 
; *nd drain diffusion extending diffext units wide tnd 
; transistor-width units high. The effect of the 
; diffusion extension is to tdd some capacitance to 
; the source tnd drain node of etch transistor - 
; useful when processing the output of NET to improve 
; the capacitive loading a pp ro xi mations without adding 
; explicit load capacitors, diffext is specified in 
; lambda (it will be converted using the lambda factor 
; above). 

resistance channel context width length resist 
; this command specifies the equivalent resistance for a transistor 
; of type channel with the specified width and length. Transistors 
; matching this entry will have the specified resistance; linear 
; interpolation is done if the width and/or length is not matched 
; exactly. 

; channel is one of 'enh', 'dep', 'intrinsic', 'low-power', 

; 'pullup', or 'p-chan' 

; context is one of 'static', 'dynamic-high', 'dynamic-low', or 'power* 

; width is given in lambda 
; length is given in lambda 
; resist is given in ohms 

(•) These paramters should be 1 only when processing the output of 
the node extractor. They cause various corrections to be made 
to the interconnect component of a node’s capacitance - usually 
only extracted sim files have information regarding interconnect 
capacitance. 

PRESIM uses these parameters in calculating the capacitance for each electrical node and the resis- 
tance for each transistor ehanncl. 
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APPENDIX D 
ADDER SIHULATION 



The following two listings are: (1) the RNL command file 
for the entire chip and (2) the results of running that 
command file. In addition to this overall testing, all the 
layout of Appendix G were simulated individually. A nice 
feature of RNL is the indication of when a watched node 
changes state. Thus, by making all the outputs of a circuit 
watched nodes, RNL will provide the minimum time duration 
for a clock cycle to produce the outputs (the longest time 
indicated by the simulation). This can be confirmed by 
running the simulation with a faster clock, resulting in 
outputs of X (neither 1 nor 0) where insufficient time has 
been allowed. 

RNL simulation to determine the minimum time for prec- 
harging the PLA circuits is only slightly more involved. 
For each product term in the ELA, alternating inputs are 
selected that will result in maximum amount of N+ diffusion 
needing to be charged from 0 vclts to Vdd. Then as these 
inputs are alternated, the PLA precharge time is reduce 
until the circuit fails to produce correct results. For the 
PLA’s in the adder, visual inspection for the product term 
with the longest precharge requirement was done by looking 
for the longest N+ diffusion line which must be charged 
through the maximum number of transistors. The visual 
inspection results were confirmed by RNL simulations. 
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act 2<- 13:b9 19- 



c r i r: . c rr H Face 1 



(loo-file " c M r .loc") 

(load "uwstc.l") 

(load n u v.- s i n • 1 M ) 

( read-re twork "cMr") 

(setc noaes 'Cal s2 3'i ab at a7 aR a9 alC all al2 al3 
a 1 4 a 1 5 ala cl 1 2 c 3 c 4 C5 d e b 7 r 9 d 9 MO tH fcl2 M3 b 1 4 b 1 5 
1 1 6 si s2 S3 s 4 s 5 so s7 s U s 5 slO sll s 1 2 s 1 3 s 1 4 si5 s 1 6 
c 1 r cout rhil d M 2 corl con? c o n 3 con4 c r r 5 ) ) 

(cnflao noo^s) 

1 a 1 a2 a 3 a 4 a * a 6 a 7 a o a c a 1 0 all a!2 a 1 3 al4 al5 a 1 6 

1 Cl fc2 d 3 P 4 b 6 p 6 P 7 rr c * b 1 0 Ml M2 M3 M 4 M5 M6 

1 coni con2 cor. 3 con 4 con5 
1 cin pMI r>U2 

(defvec '(Mr cIccks p h 1 1 c h i 2 ) ) 

(defvec '(tin aaaa aid al5 aD a 13 a!2 all alO 
av a F a 7 36 a 5 a ^ a3 a2 all) 

(defvec ' ( fci r o r n b n i 6 Mb r 1 4 bl3 M2 ell MO 
r 4 D 9 d 7 b 6 t 5 b 4 d 3 C2 cl)) 

(defvec '(Mr sun cout slo sib si4 si3 sl2 sll sio 
s 9 sb s7 sb s 5 s4 s3 s2 si)) 

(def-reDort '("scale Is no*:" (vec cioc<s) cin cout newline 
(vec aaaa) newline 
(vec br to) ne^ line 
(vec sur ) ) ) 

(defur ss(ourrrrv) 

(step incr ) ) 

(defun cvcles (a) 

(repeat i 1 a 

(setc incr l r> CM 
(ss ' (x ) ) 

(setc incr 2 in* ) 

( h ' (nhi) ) ) 

(ss '(x)) 

(setc incr 100) 

Cl '(DM3 ) ) 

(SS '(X)) 

(seto incr 25(0 
(n '(cm?)) 
is ' c x n 

(1 '(CM2)) 

) 

) 



(cycles 


5) 




( invec 


' ( aaaa 


UCOnoOl 1 1 100001 1 ID) 


( invec 


' ( btfct 


Obi 11 1 000M 1 110000) ) 


( cycles 


3) 




( 1 nvec 


' ( bcbb 


0 p 0 0 0 0 C 0 0 Hi 0 0 0 0 0 0 1 ) ) 


(cycles 


3) 




h cin 






(cycles 


3) 




1 cin 






(Invec 


'(aaaa 


OM 11 11 M 1 11 1 11 1 1 1 ) ) 


(invec 


' Cbt bt 


0 o c. 0 0 0 0 0 0 0 0 U 0 0 0 0 0 0 ) ) 


(cycles 


3) 




n cir 






(cycles 


3) 
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Get 2° 




1 cin 




C invec 


' ( tbtt 


(cycles 


3 ) 


( invec 


0 ( bnbe 


(cycles 


n 


(Invec 


' ( aeaa 


( cycles 


1 ) 


( Invec 


0 C h n t fc 


(cycles 


1 ) 


( Invec 


# (aaae 


(cycles 


3 ) 


( invec 


' r bbcc 


( cycles 


1 ) 


h cl r 




(cycles 


3 ) 


( invec 


0 ( a a a a 


( invec 


' ( t cht 


( cycles 


1 ) 


1 cin 




( cycles 


O 



19*^ cr ic . Citc Pace 2 



0 b 0 0 0 C 0 0 0 0 0 '•) 0 0 o 0 0 l ) ) 
0 r 0 n o Ii 0 0 0 0 0 0 0 0 G fi 0 ) ) 
0 h G a 0 tj 0 0 (j 0 n 0 C C 0 0 0 0 ) ) 
ur.iiiiiinniiint)) 



0 h '» 0 0 o J 0 f- " o 0 U 0 0 U f » o n 



otmioimiiimi)) 



obi'ornnoooooornnoo ) ) 

OcGOOOOOiK OCCnooof 5 ) 
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Pec 6 15:23 1964 cMp.loa Page 1 



Loading u w s 1 rr . 1 
Done loading uwsjn.i 

; 3086 nodes, transistors: enn=1494 intrinsic=0 

Ster berins 9 0 ns. 

rhi?=C a 0 

rhi 1 = 0 a 0 

cin=C 9 0 

c o r 5 = 0 » o 

con4=o a 0 

cor3=0 a o 

con 2 =o a 0 

conl =0 a 0 

b 16 = 0 0 0 

b 15 = 0 a 0 

b 1 4=0 a 0 

b 1 3 = C © 0 

b 12=0 a 0 

hllsO a 0 

bio=c a o 

b 9 = 0 a 0 
b* = o 9 0 
b 7 = 0 a 0 
b 6 = 0 a 0 

b5=0 9 0 
r 4=0 © o 
b3=n <? o 
h 2 = n {? 0 
b 1 =o a o 
a 16 = 0 a 0 
a 1 5 = 0 P 0 
al 4=0 a 0 
a 1 3 = 0 p 0 
a 1 2 = C a 0 
a 1 1 =0 a 0 
a 1 0 = 0 a C 
a9=o p 0 
a 9 = 0 P 0 
a7=c a C 
a 6 = 0 © o 
e5 = 0 p 0 
a 4 = 0 a 0 
a 3 = 0 © 0 
a 2 = o p 0 
a 1 =0 a 0 



p-cnan=il4i deo = o io*-rower=0 t>uilup = P reslstc 



Ster beoins P 10 ns. 
phi 1 =1 9 0 

Stec begins P 35 ns. 
Dhil=0 a 0 

Stec beains © 45 ns. 
Dhi2=l a 0 
s 1 6 = 0 a 14.2 
s°=0 p 16.4 
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Dec 6 15:71 1904 chlo.log F a a e 2 



sll = 0 ® 16.4 
s 1 3=0 ® 16.4 
S 1 5=0 0 16.4 
s7 = 0 6 16.4 
S 5 = 0 fl 16.4 
s 3 = 0 a 16.4 
S 1 4 = 0 0 16.5 
s 1 2 = 0 a 16.5 
S 1 0 = 0 9 16.5 
S*=0 0 16.5 
s6 = 0 0 16,5 
S 4s 0 ? 16,5 
s?=0 6 16,7 
) S 1 =0 a 20 

state is now: 

Current tirre = 70 
clocV;s = 0b01 cin = 0 c o u t = X 
eaaa=0fcO00000DOonnc0000 
bbbbsObOOOOOOOOOOOOOOOO 

> SUIPSXOOOOOOOOOOOOOOOO 

Stec beains a 70 ns. 

> ph 1 2=0 0 0 

Step becins ® PO ns. 

) oh i 1 = 1 t 0 

Stec beains P 105 ns, 

} chi 1 =0 e 0 

Step beains ® 115 ns, 

> phi2=l P 0 

c o u t = 0 0 72,9 
state is now; 

' Current r tire- 140 

cloc*s=0b01 cln=0 cout=0 
eaaa=ObOOCOOOOOOOOOOOOO 
bbhb=OfcOOOOOOOOOOOOOOOO 
SUHi = ObOOOOOOCOOOOOOOOOO 



Sten becins 


9 


140 


ns . 


chi2=0 p 0 








Ster becins 


Q 


150 


ns , 


Dhil=l P 0 








Step becins 


9 


175 


ns . 


phil=0 ® 0 








Ster begins 


9 


185 


ns . 


Dhi2= 1 ® 0 








state is now 


• 






Current tiire 


r 


210 




cloc*s=ObOl 


C 1 P = C 


c o u t = o 



aaaa=ObOCOOOOOOOOOCOOcC 

bhfcb=ObOOOOOCOOOCOOOOOO 



8 6 



Dec 6 15:23 1 9 R 4 cnlp.loa Pace 3 



SUmsObOOOOOOOOOOOOOOOOO 



Step begins 


(e 


210 


ns . 


pni2=0 e 0 








Step beains 


9 


220 


ns . 


phi l =1 o o 








Ster becins 


0 


245 


ns. 


chil=0 e o 








Step hecins 


0 


255 


ns . 


ohi2=l p 0 








state is now 


: 






Current tiTe 


= 


2 P 0 




clocKs=0b01 


c in = o 


cou t=0 



aaaa=Ob OOOOOOOOOOOOOOOO 
bbbbrObOOOOOCOOOOOOOOOO 
SUm=OfcOOOOOOOOOCOCOOOOC 

Ster beains f* 2«0 ns, 
d h i 2 = 0 e 0 

Ster beoins ® 290 ns. 
ph i 1 = 3 p 0 

Step beains 0 315 ns. 
chi 1 =0 P 0 

Ster beoins P 325 ns. 
ohi?= l p 0 
state is now* 

Current tln , e= 350 
clccks=0b01 c in=C cout=0 
aaea=ObOOOOOCOOOOnooOOO 
bbbb=ObOOOOOOOOCOOOOOOO 
SUn=ObOOOOOOOOOOOOOOOOO 

Step beains ® 350 ns. 

b 1 6= 1 p 0 

b 1 5 = 3 9 C 

bl 4=1 P 0 

bl 3=1 0 0 

bft= 1 0 0 

b 7 = 1 P 0 

b 6 = 1 p 0 

b5=3 o C 

a 1 2=1 <a 0 

a 1 1 = 1 9 0 

a 1 0= l s o 
a9=l p 0 
a 4 = 1 p 0 
a 3 = 1 e C 
a2= 1 e 0 
al=l ® 0 
nhi2=0 a 0 
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Step bealns a 
ohi 1=1 a o 


360 


ns . 


Sten bealns P 
ch 1 1 = 0 a o 


385 


ns. 


Step bealns e 


3^5 


ns , 


p h 1 2 = 1 a 0 
state is now: 
Current tine = 


420 




clcc)<s = 0b01 cin = 0 


cout =0 



aaaa = nbOOPOiniooooi ill 

bhbhsObl 11100001 1 110000 

surr=ObOOooooccoonoooooo 



Steo beclns 


a 


420 


rs. 


ph 1 2=0 a o 








Ster bealns 


a 


430 


ns. 


phi 1 = 1 a n 








Step bealns 


9 


455 


ns. 


phii =0 a o 








Stec beclns 


a 


465 


ns . 


pb 1 2= 1 a o 








state Is now 


: 






Current tin* 


= 


490 




ClOC* S=0b0l 


Cln = 0 


cou t=0 



aaaa=OPOOOOllllOOOOlili 
fcbbb=Cfcl 1 1 1 000011 1 10000 
SUrrsOtOOOOOOCOGOOOOOOOO 

Ster bealns 3 490 ns. 
phi 2 = 0 a 0 

Ster beclns a 500 ns. 

chi l=i 9 o 

Step bealns 6 525 ns. 
phi 1=0 9 o 

Step bealns P 535 ns. 
chl?=l e 0 
s 1 b= 1 a 14,6 
s 9= 1 a 16.7 
s 1 1 = 1 a 16.7 
s 1 3= 1 a 16.7 
s 1 5= 1 a 16.7 
S 7 = 1 a 16.7 
s 5= 1 a 16,7 
s 3 = 1 a 16.7 
S 1 4 = 1 P 16.6 
s 1 2= 1 p 16.9 
SlOrl p 16. F 
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S0=1 « 16. R 
s6=l a Jo. 8 

S 4s 1 ft 16.8 

s2=i a 17 
s 1 = 1 « 19.1 
state is now; 

Current tine = 560 
clocfcs=0b01 c i n = 0 cout=0 
aaaa=0t0000 1 1 1 l OOOOi l i j 
bbbHrCbtl 1 100001 1 1 10000 
sumsObOUl 11 11 11.11 1 1 1 1 1 

Step fceains 9 560 ns. 
h 9= 1 ® 0 
b 1= 1 a 0 

b 1 6 = 0 0 0 

C 1 5 = C ? 0 
b 1 4=0 0 0 
b 1 3 = 0 a 0 
b 8 = 0 a C 
b 7 = 0 0 0 
b6=0 a 0 
1 5 = C a C 
phi?=C a 0 

Step becins a 570 ns. 
Phil=l a 0 

Step becins 9 595 ns. 
phi 1=0 e 0 

Step becins a 605 ns, 
nhi?=l a 0 
state is now: 

Current tine = 630 
clocks=0b01 cin=o cout=0 
aaaa=0b000Cl 1 1 1 0000 i l ll 
bbbb=ObOOOOOOOlConoOOOl 
SumsOhOl 1 11 1 1 11 1 1 1 11 1 1 1 



Step bealns 


a 


630 


ns . 


phi2=0 0 o 








Step beolns 


9 


640 


ns , 


phi 1=1 « 0 








Step becins 


9 


665 


ns . 


Dhi 1 =0 a 0 








Step becins 


e 


675 


ns . 


Phi 2= 1 a 0 








state is now 


: 






Current tine 


= 


700 




clccks=0b0 l 


c in = o 


cout =0 



a*aa = 0bcoooiiiiooooiin 
nbbbrOfcOOOOOOOIOOOOOOCl 
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s um = o n o 1 111111111111111 

Step beoins a 700 ns, 
oh 1 2=0 a 0 

Step bealns P 710 ns, 
ah 1 1 = 1 e 0 

Step beoins g 7. 15 ns, 
Dhil=0 a o 

Sten beoins ? 745 ns, 
Phi2=l e o 
S 1 6 = 0 ? 14.2 
S 9 = 0 fi lb. 4 
S 1 1 = 0 fi 16,4 
S 1 5=0 0 16.4 
5*7 = 0 a 16.4 
S 3 = 0 0 16,4 

S 1 4 =0 0 16.5 
s 1 2=0 ? 16.5 
s 1 0 = 0 a 16.5 
S 9 = 0 a 16.5 
5 6=0 0 16.5 
S 4 = 0 fi 16.5 
s 2 = 0 a 16.7 
S 1 =0 a 20 
state Is nov.-: 

Current times 770 
clocKs=0b01 cin=0 cout=o 
aaaa=0b0000llll00C0llii 
bbtb=0b0CO0O00l 00000001 
SursObOOOOlOOOOOOOlOCOO 



Step becins 


fi 


770 


ns , 


cin= 1 a o 








ph i 2=0 a o 








Step becins 


fi 


7 90 


ns , 


phi 1 = 1 0 0 








Step bealns 


? 


905 


ns , 


phi 1=0 a o 








Step bealns 


0 


815 


ns , 


bhi 2 =l e o 








state Is now 


: 






Current time 


s 


8 4 U 




clocKs=0b01 


cin = 1 


c o u t = 0 



aaaa=Ofc00001111000Cllll 
thbh=0t000C0C0100O0O001 
S UT =0600001 00000 G 010000 

Step bealns » 840 ns. 
bh i 2=0 P 0 
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Step beains £ B50 ns* 
phil*l « 0 

Step beoins ? 875 ns. 
phi 1 = 0 a 0 

Ster beclns £ 895 ns. 
Chi2=3 ® 0 
state Is no*: 

Current 1 1 * e = 910 
cloc)cs = ObUl cln=3 cout=0 

aaeasobooooiiiioooonil 

bbbbsOhOOOOCOOlOOOOOOOl 
sun» = 0b000 0l00000001 0000 



Step beajns 


0 


91 0 


ns . 


phi 2=0 0 0 








Ster beoins 


0 


920 


ns . 


phil=l ? 0 








Ster beoins 


0 


945 


ns . 


phil =0 P 0 








Step beoins 


e 


955 


ns . 


phi 2= 1 « 0 








s 3 = 1 9 39.3 








state is now 


• 






Current time 


= 


980 




cloc*s= 0 b 01 


ci n =3 


cout =0 



aaaasObOOOOilllOOOOllii 

bbbb=0fc0000000100000001 

surrsObOOOOlOOOOOnoiOOOl 

Stec begins e 980 ns, 

a 1 6= 3 a 0 

a 1 5= 1 (3 0 

a 1 4 = l o 0 

a 1 3 = 1 P 0 

a 8 = 1 P 0 

a7 = i 0 o 

a6=l (a 0 

a 5= 1 p 0 

b<J = 0 fi C 

b 1 = 0 ? 0 

c in = C p 0 

ohl 2=0 0 0 

Ster beclns P 990 ns, 
Dhi 1=1 p 0 

Step beoins f 1015 ns. 
pnil=0 0 0 

Step beoins fr 1025 ns. 
oh i 7 = 1 0 0 
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state Is now: 

Current 1. 1 jr e = 1^50 
clccfcs=ObO 1 c 1 n = 0 cout=0 
aaea=Ob 1 1 1 1 1 111113 11111 
bbbfceOfcOOOOnonOCOOOOOCO 
SUm=nbC000100C000030001 

Step fceolns 0 1050 ns, 
Phi2=0 0 o 

Step heclns 0 1060 ns, 
ohijsi a 0 

Ster beolns 0 1005 ns, 
phi 1 =0 a 0 

Sten beolns 0 1095 ns, 

phi2=l a 0 

state Is now: 

Current tlrrer H2C 
clccKs=0b01 cin=C coutsn 
aaaa = 0fc 1 3 1111111133 nil 

bbtbsccoooooononuoooono 

SUi"=ObOOOO 100 00 000 1000 3 

Ster fceolns 0 1120 ns , 
phi2=C a 0 

Step beolns a 1 1 3 0 ns, 
chiin a 0 

Step beolns 0 1155 ns, 
oh 1 3 =0 a 0 

Step beolns 0 II65 ns. 

phi2= 1 ® 0 

Sl6 = l a 14.6 

sa=l 0 36.7 

S 1 1 = 1 0 16.7 

S 1 5=1 0 16.7 

S 7 = 1 0 16.7 

S 3 = 1 0 36.7 

S 1 4 = 1 0 16.P 

s 1 2=1 0 16.9 

S 1 0 = 1 9 16.3 

50=1 0 16.8 

s 6= 1 e 16.8 

S 4= 1 © 16,8 

S 2= 1 0 17 

state is now; 

Current tiire= 1190 
c locKs=0b0 1 cln=0 cout=o 
aaea = 0b 1 1 1 1 11 3* 3 1 11 1 1 1 1 1 
bbbb=0b00onooooooooooon 
surrsObOl 1 1 1 1 1 1 1 1 11 1 1 1 1 1 
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Sten bealns e 1190 ns, 
cin = 1 0 0 
phi2 = 0 0 0 

Ster bealns 9 1200 ns. 
ohi 1=1 ® 0 

Step bealns 0 1225 ns. 
d h 1 1 = o 0 o 

Sten bealns 0 1235 ns. 
rh 1 2= 1 0 0 
state is now: 

Current tlrre= 1260 
cloc*s=Ob01 cln=l cout=0 
aaaasOfclllinillHllI 11 
bbbbsOfcOoOUOOOOOOOOOOOO 
sum=0b0111111 13 11111111 

Step bealns ® 1260 ns. 
ohl 2=0 0 0 

Ster bealns © 127n ns. 
ohl 1=1 0 0 

Step bealns 0 12^5 ns, 
philsO Q 0 

Ster bealns * 1305 ns, 
nh 1 2= 1 0 n 
state is now: 

Current tibe = 1330 
cloc*s=0b01 cin=3 cout = 0 
aaae = Ot 111111113 1111111 
bhbpsOt 000000 00 00000000 
surrsObOl 3111 1 1111111111 

Step bealns 0 1330 ns. 
nhi 2 = 0 0 0 

Step bealns 0 1340 ns. 
ohl 1=1 0 0 

Step bealns ? 1355 ns. 
phi 1 =0 0 0 

Ster beclns 0 1375 ns. 
oh 1 2= 1 0 n 
S 1 6=0 0 14.2 
S9=0 P 16.4 
S 1 1 =0 0 16,4 

S 1 3=0 0 16.4 

S 1 5 = 0 0 16.4 
S 7 = 0 © 16.4 
s5=0 e 16.4 
S 3 = 0 0 16.4 
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s 1 4=0 P 16.5 
S 1 2=0 a 16.5 
5 1 0=0 a 16.5 
sflrO a 16.5 
s6=0 a 16.5 
S 4 = 0 a ie.5 
S 2 = n a 16.7 
S 1 = 0 <? 20 
cout=1 0 21.1 
state Is now: 

Current tlme= 1400 
cl ocKs=Ooo 1 cln=l cout=l 
aaaasotll l 1111111111111 
tbbbsotnoooooooooooooon 
SUfr = 0M^0noo0000000nO00 

Ster beams a 14 on n s. 
h 1 = 1 a 0 
cln=o a 0 
ohi?=0 a 0 

Stec beoins fi HiO ns. 
phil=l a o 

Sten healns ® 1435 ns. 
pnil=C a o 

Step beams 9 1 445 ns. 
phi2=l e 0 
state is now; 

Current tUe= 1470 
cloc*s=0b01 cin=0 cout=1 
aaaa=0 t 1 111111111111111 
bbbb=0b0000000000000001 

sum = Ob 1 oono 0000000000*0 

Steo ben ins £ 1470 ns. 
phl?=0 a 0 

Step beclns * 1 4 90 ns. 
pni 1 =1 a o 

Step beoins 9 1505 ns. 
phi 1 =0 e 0 

Step beoins 9 1515 ns. 

Ph i?= 1 a o 
state is nowi 
Current time = 1540 
cloc)cs = ObOi cin = C c o u t = i 
aaaa = Ob 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
bbbhsoboooooonooooooooi 

SUft.rOblOonOOOOOOOOOOOOO 

Step heains a 1540 ns, 
phi?=0 c n 
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Step beqins e 1550 ns. 
ohi 1 = 1 e 0 

Step beolns a 1575 ns. 
ohi 1 =0 fi 0 

Ster beclns 9 1585 ns. 
phi?=l e 0 
state is new: 

Current time? 1610 
clocks=Ob01 c 1 n = 0 cout=l 
eeaa=0bl 1 11 1 11111111111 
bbbbsOtOOnnoOOhoOOOOOOl 
SiimsOblOOOOOOOOOrOOOOOO 

step beci.ns £ 1610 ns. 
b 1=0 a 0 
ohi2 = 0 £ 0 

Step beains 0 1620 ns, 
Dhil =1 £ 0 

Step begins « 1645 ns, 
d h 1 1 = 0 a 0 

Ster realns £ 1655 ns. 
phi2= 1 £ o 
state Is now: 

Current tlme= 1680 
clocV:s = 0bOi cin = o cout = l 
aaae = 0fc 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 
bbbb=0t00000000000000h0 
sum=0b 1000000 0000000000 

Step beclns 9 16P0 ns. 

a 1 6 = 0 a o 

815=0 £ 0 

al4=n e o 

al 3=0 £ o 

a 1 2 = C fi 0 

a 1 1 = 0 e 0 

a 1 0 = 0 « 0 

a^= 0 £ 0 

a R = 0 e 0 

a 7 = 0 a 0 

a6=0 £ 0 

a5=0 9 0 

a^=o e 0 

a 3 = 0 e 0 

a2=o o c 

al=0 a 0 

Phi2=0 a 0 

Ster benlns 9 1690 ns, 
philri a o 
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Step beains 0 1715 ns, 
ohil=0 P 0 

Ster beains P 1725 ns, 
chi 2 = 1 e 0 
state is now: 

Current tirre= 1750 
cloc>c s = 0to 1 cin = o cout = l 
aaae=0fc00 0000000 0000000 
bbbb=0tO00O000C0nncoC00 
SUff = OfclOoOOOCOCOOOOOOOO 

step hcains 9 1750 ns, 

bt 6 = 1 a 0 

nl 5=1 a 0 

b 1 4 = 1 ? 0 

b 1 3= 1 a C 

fc 1 2= 1 a 0 

b 1 1=1 3 0 

b 1 0 = 1 a p 

b9=l a C 

b 8 = 1 a 0 

b 7 = 1 a 0 

h 6= 1 p P 

65=1 a 0 

n 4= 1 9 0 

b 3 = 1 » 0 

b 2= 1 P 0 

hi =1 P O 

ohl 2=0 £ 0 

Ster beolns a 1760 ns. 
rhil=i p 0 

Steo beoins a 1785 ns, 
ohl 1 =0 a 0 

Ster heclns 179? ns. 

phl2= 1 a 0 

s 1 6=1 a 14.6 

s 9 = 1 a 16.7 

s 1 1 = 1 a 16.7 

s 1 3= 1 a 16.7 

si 5= 1 0 16.7 

s 7 = 1 a 16.7 

s5=l a lb. 7 

s 3 = 1 a 16,7 

$14=1 P 16.8 

S 1 2= 1 p 16,8 

sioal a 16.8 
S 8 = 1 a 36,8 
s6=l ? 16.6 

S 4 = 1 p 16,0 

s 2 = 1 e 17 
si =1 p 19.1 
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cout=0 p 22.9 
state Is now: 

Current times l fl 20 
clecks=Otoi c in = o cout=o 
aaae=OfcOOOOCOOOCOOPOOOO 
fcbtfcsOt 1 11 1 111 11 U 1 111 1 
surrsOtOlllUlllllllllll 

Ster fceoins P 1820 ns. 
a 1 2 = 1 » 0 

Dhi 2=0 e o 

Ster beolns 0 1830 ns, 
nhi l =1 <? o 

Ster heolns 0 1^55 ns, 
ohil=0 0 0 

Stec beolns 9 1 » 6 5 ns. 

nhi2=l » o 

s 1 6=0 e 14.2 

S q = 0 » 16.4 

S 1 1 =0 ? 16.4 

s 1 3 = 0 0 1ft. 4 

s 1 5=0 0 1ft. 4 

s7 = 0 0 16.4 

S 5 = 0 0 16.4 

S 3 = 0 0 16.4 

S 1 4 s 0 P 16.5 

s 12=0 p 16,5 

s 1 0=0 0 16.5 

S 8 = 0 ® 16 , 5 

Sft=c 0 16.5 

S 4=0 0 16.5 

S 2 = 0 0 1ft. 7 

s 1 =0 a ?0 

state is now: 

Currert tire= 1890 
c lock s = 0b0 1 cir=0 cout = 0 
aaea=ObOOOOlOOOOOOCOOOO 
bbfcb = 0 1 1 111111111111111 
SUmsObOOOOOOOOOOOOOCOOO 



Ster beolns 
b 1 2=0 a 0 
nhi2=0 0 0 


0 


1 890 


ns . 


Step beolns 
Dhi 1=1 e o 


0 


1900 


ns , 


Ster beolns 
Dhi 1=0 0 0 


0 


1 q 25 


ns . 


Step benins 
uhi2=l e o 


0 


1 q 3 5 


ns . 


S 1 ft = 1 « 14. 


6 
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s9=l « 16.7 
S 11 = 1 0 16.7 
S 1 3=1 P 16.7 
Sl5=l 0 16.7 
s 7 = 1 a 16.7 
S 5= 1 0 16.7 

s3= 1 9 16.7 

S 1 4 = 1 a 16. 8 
s 1 2= 1 fl 16.8 
slO = l a 16.8 

$8=1 a 16.8 

s 6= 1 a 16.8 
?4=1 a 16.8 
s 2=1 P 17 
sin « 19.1 
state Is now: 

Currert 1 1 m e = i960 
clocKs=0b0J cln = 0 cour=o 
aaee=0fc0000i 00000000^00 
bbhb=0b 1111011111111111 
SUfrrOhOl HI 1 J 1111111111 

Ster be a Ins P I960 ns. 
Cinsl 0 0 
phi2=0 ? 0 

Step ben ins P 1970 ns. 
d n i i = l a o 

Step becins 9 1 Q 95 ns, 
nhi 1=0 a 0 

Stec reclns 9 2005 ns, 
ohi2= 1 « 0 
s 1 6=0 a 14,2 
s 1 3 = 0 a 16.4 
S 1 5=0 a 16,4 
s t 4 = C a 16.5 
s 1 2=0 a 16.5 

coutn 6 21,1 
state Is now: 

Currert times 2030 
clocxs=0fc01 cir=l coutsl 
aaae=0fc0000i00000000000 
bbbb=Obllll0111111111 11 
suiPsOblOOOOOl 1111111111 

Ster beams 0 2030 ns. 

bl 6 = 0 0 0 

b 1 5=0 0 0 

h 1 4=0 e 0 

bl3=0 P 0 

bl 1=0 P 0 

b 1 0 = 0 P 0 

t9=o e n 

h ft = 0 p 0 
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b7 = n s o 
b 6 = 0 e 0 
b5 = 0 e 0 
b4=0 P 0 
b 3 = 0 B 0 
b 2 = 0 e 0 
b3=0 b 0 
a 1 2 = 0 a o 
ohi 2=0 a 0 

Step beoins P 2040 ns. 

ohilsl a 0 

Sten be bins B 2 0 65 ns. 
phi 1 =0 P 0 

Ster beoins a 2075 ns. 
ph 12= 1 B 0 
S 1 6 = 1 a 1 4.6 
S 1 3= 1 a 16.7 
S 1 5 = 1 P 16.7 
5 1 4 = 1 P 16.9 
S 1 2 = 1 a 16. a 
cou t=0 B 22.9 
state is now: 

Current times 2100 
clocks =0b01 cin=l cout=0 
aaaa=ObOOOOOOOCOOOOOOOO 
bbbb=0bC 00000 0000000 000 
SumsObOllll 11 111111 1111 

Ster beoins a 2100 ns. 
cln=0 a o 
cni 2 =o a o 

Step beoins P 2110 ns. 
phi 1 = 1 e o 

Step begins B 2135 ns. 
phi 1 =0 B 0 

Step beoins a 2345 ns. 
phi 2 = 1 (? 0 

S 1 6=0 B 14.2 
S 9 = 0 P 16,4 
S 1 1 = 0 B 16,4 
S 1 3=0 P 16.4 
Si 5 = 0 B 16.4 
S 7 = 0 P 16.4 
S 5 = 0 a 16.4 
s 3 = 0 B 16.4 
s 1 4 = 0 p 16.5 
s 12=0 6 16,5 
s 1 0 = o <? 16.5 
S B = 0 P 36,5 
S 6 = 0 B 16,5 
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S4s0 <? 16.5 
s 2 = 0 e 16.7 
sl=0 e 20 
cout=l * 21.1 
state is now: 

Current time = 2170 
clocKs = ObOi c 1 n = 0 cout = l 
eaaa=OfcOOOOOOOOOOOOOOCO 
dbthsOtnO^OOOOOOOOOOOOO 
SUfT = OblOOOOOOOOOOOCnOOO 



Stec beains 


A 


2170 


ns . 


Dhl 2=0 9 0 








Stec beclns 


P 


2160 


ns . 


dM 1 = 1 e 0 








Stec beclns 




2 205 


ns , 


rhl 1=0 c 0 








Ster beclns 


P 


2215 


ns . 


Dhi?=l ? 0 








cout=0 e 22, 


9 






state Is now 


: 






Current tire 


r 


2240 




cloc*s=ObOl 


C j n = 0 


cou t =0 



aaaasOdOOOOOOOOOOOOOOno 

cobb=ObOOOOnoOOOOOOOnoc 

sarrsOtOOOOCOOOOOOnOOOOO 

Step beains fl 2240 ns. 
nhi2=0 e 0 

Sten *eains p 2250 ns, 
ohilsi p 0 

S ter beclns p 2275 ns, 
rnil=0 ® 0 

Stec nea Ins p 2265 ns, 
Dhl 2= 1 ® 0 
si =0 a 20 
state Is now: 

Current tirre = 2310 
cloc*s=0b01 cin = 0 cout = 0 
aaaa=ObOOOOOOOOOOOOCOOO 
bbbb=0b0000000000000000 
SU^cObOOCOOOOOOOOOOOOOO 

Step beclns P 2310 ns. 
Dhl2=0 a 0 

Ster beclns P 2320 ns. 
Dhl 1=1 e o 

Ster beolns b 2 3 4 5 ns. 
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ohl 1=0 9 0 

Step beclns g 2355 ns. 
phi2=t fi 0 
state Is nnw: 

Current tirre = 7390 
clcc*s=0b01 cln=C cout=o 
aaaa=OfcOCOOOCOOOCOOOOOC 
Pbtr=OtOOOOOCOOOOOOOOCO 
SumsOb 0 00 00 0 000 000 00000 



exit 
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LAYOUTS 



LEGEND 




Contact Cut 
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XOR Gate 
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CON (n) 



out 



shift out 
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Addend A 



appendix F 
TEST VECTORS 

Addend B 



msb 



1st 



msb 



lsb 



initialize all internal nodes 



0000000000000000 

0000000000000000 

0000000000000000 

test for proper P 

0000000000000000 

1111 1111111111 11 
0101010101010101 
1010101010101010 



0000000000000000 

0000000000000000 

ooooooooooocoooo 

and G primitives 

1111 1 11 1 11 1 11 11 1 
ooooooooooocoooo 
10101010101 01010 
0101010101010101 



test for proper IES 

0001000100010001 

0001000100010001 

0101010101010101 

0101010101010101 



0000000000000000 
0001000 1000 10001 
00010001000 10001 
0101010 10 10 10 10 1 



test for proper IC23 



0101010101010101 
0010001000 1000 10 

test for carry from 

00000000000011 1 1 
00000000000011 11 
00000000111111 11 



001 1001 1001 1001 1 
001 1001100 1 1001 1 

block to blcck 

000000000000000 1 
0000000000000000 
cooooooooooooooo 



Cin 



0 

0 

0 



0 

0 

0 

0 



0 

0 

0 

0 



0 

0 



0 

1 

1 



11 1 



Sum 

msb- - - - - -lsb 



xxxxxxxxxxxxxxxxx 

xxxxxxxxxxxxxxxxx 

00000000000000000 



01111111111111111 

01111111111111111 

01111111111111111 

01111111111111111 



000010001000 1000 1 
0001000 1000 10001 0 
001 100 1 100 1 100 110 
0101010 1010 101010 

01000100010001000 

00101010101010101 

00000000000010000 
000000000000 10000 
00000000100000000 



00000000 1 1 1 11 1 11 
00000000 11 1111 11 
0000 1 1 1 1 111 111 1 1 
0000 1 1 1111 1111 1 1 
0000111111111111 
oooo min mi ii 
1111111111111111 
1111111111111111 
1111111111111111 
1111 1111111111 11 
111 1 11 11111111 11 



0000000000000001 0 
00000000000 10000 0 
0000000000000000 1 
0000000000000001 0 
0000000000010000 0 
0000000100000000 0 
0000000000000000 1 
000000000000000 1 0 
0000000000010000 0 
0000000100000000 0 
00010000000C0000 0 



000000001 00000 000 
0000000010000111 1 
00001000000000000 
0000 1000000000000 
00001000000001 11 1 
ooooioooon 111111 
10000000000000 000 
10000000000000000 
10000000000001111 
loooooooon 111111 
10000 1 1 1 11 1 1 11 11 1 
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